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1. GENERAL SUMMARY 

This grant originally ran from 3/15/92 to 3/14/93 and was renewed for an additional 12 months. NASA 
accepted a no-cost extension for another year so that money remaining in the grant could be used to support 
Andrew Walker, the Pi’s graduate student, whose thesis involves work under the grant |Tfie main results of the 
grant were (1) finishing the manuscript of a proof of the completeness of the Poincard modes (Poincard, 1885; 
Bryan, 1889; Lebovitz, 1989; Greenspan, 1990) in an incompressible nonviscous fluid corotating with a rigid 1 
ellipsoidal boundary (Backus, 1994a), (2) partial completion of a manuscript describing a definition of helicity 
that resolved questions in the literature about calculating the helicities of vector fields with complicated topolo- 
gies (Woltjer, 1958; Moffatt, 1978; Berger and Field, 1984; Backus, 1995a) and (3) the beginning of a reexam- 
ination of the inverse problem of inferring properties of the geomagnetic field B just outside the CMB from 
measurements of elements of B at and above the earth’s surface. This last work has led to a simple general 
formalism for linear and non-linear inverse problems that appears to include all the inversion schemes so far 
( considered for the uniqueness problem in geomagnetic inversion (Langel, 1991). The technique suggests some 
new methods for error estimation that form part of this report * 

Projects (1) and (2) were described in the Progress Report for NAGW-2967 dated 12/4/92 and in appendices 
accompanying that report Work continues on project (2). We describe project (3) in summary below and in 
more detail in Appendix A of this report Appendix A is a manuscript accepted by JGR-red on condition that it 
be shortened and certain mathematical items be omitted. The author felt this would impair the intelligibility of 
the paper, so he withdrew it and plans to submit a longer version to Geophysical Journal International. 

2. THE INVERSION FORMALISM 

An inverse problem arises when we measure finitely many data y 1 , • • ■ ,y° and try to predict from them 
finitely many numerical properties z 1 , • • • ,z p of the earth. The earth is regarded as an unknown member x E 
of an infinite-dimensional model space X. Since D and P must be finite, we can introduce the data space Y 
and the prediction space Z consisting respectively of all real D -tuples y = (y 1 , • • • ,y D ) and all real P -tuples 
z = (z', ■ ■ • ,z p ). Both Y and Z are real linear spaces but X may be only a topological space. We assume 
that we know continuous functions F:X—>Y amd G:X—>Z that would enable us to compute the true data vec- 
tor y £ = F(x E ) and the true prediction vector z E =G(x E ) if we knew the true earth x E . In fact, we do not 
know x E or even y E but only the measured data vector y 0 = (y 1 , • • • ,y D ). It is related to y E by 

yo = y£+S*y + 8 s y (1) 

where 6*y is the random error in the data vector and 8 s y is the systematic error. There is no information about 
these errors except that 5 s yeV s , a known subset of Y, and that 5*y is a realization of a vector random 
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variable in Y whose probability measure % is known except for a few parameters to be determined from the 
data (Backus, 1989). This inverse problem as stated is well-known to be insoluble for most F and G (Backus 
and Gilbert, 1967). The remedy is to invoke prior information about x £ in the form of either a "hard" or a 
"soft" bound (terminology from Jackson, 1979). These are analogous to systematic and random errors. A hard 
bound is a subset U E of X for which we know that x E e U E . A soft bound is a probability measure |i £ on X 
such that for any measurable subset U of X we assign probability \i E ( U) to the event x E e U . In the geomag- 
netic problem, an example of a hard bound is that the ohmic heating rate of the core, a quadratic form in B, 
must be less that the heat flow out of the earth’s surface (Parker, 1972; Gubbins, 1983; see, however, Backus, 
1975). A bold extrapolator could extract a soft bound, a probability measure on X, from Constable and 
Parker’s (1988) suggestion that the gauss coefficients, the spherical harmonic expansion coefficients of the mag- 
netic scalar potential, are independent normal random variables whose variance depends only on degree /. 
Constable and Parker applied the Kolmogorov-Smimov test (Kendall and Stuart, 1979) to degrees 2</<12 
because the axial dipole is well-known to be anomalously large, while above / = 12 the gauss coefficients of the 
whole earth are believed to include an appreciable crustal contribution and so do not directly describe the core. 
Another soft bound might be a Bayesian observer’s description of prior beliefs by means of a subjective prior 
personal probability measure for x E in X. Franklin (1970) developed an inversion theory based on soft prior 
information, but did not discuss the source of that information. Backus (1970a) considered both hard and soft 
prior bounds and suggested that a hard bound might be replaced by a suitably chosen soft one. Jackson (1979) 
investigated this "bound-softening" proposal in some detail, and Gubbins (1983) used it to estimate gauss 
coefficents and their errors at the CMB. It is now known that Backus’s suggestion of 1970 was wrong, and that 
softening a hard quadratic bound necessarily introduces spurious prior "information" about x E when dim X = °° 
(Backus, 1988, 1989a). 

Rigourous inversion schemes have been addressed to one of two audiences. The Bayesians are willing to use 
probabilities to quantify personal beliefs. The frequentists hold that probabilities are meaningless unless they 
describe frequencies of outcome in experiments that are at least conceptually repeatable. In inversions frequen- 
tists harden all soft bounds and random errors by computing confidence sets and confidence intervals (Backus, 
1989; Donoho, 1989; Stark, 1992). Hardening soft bounds does not introduce spurious new "information" about 
x E or 8* y (Backus, 1989). Bayesians have tried to soften all hard bounds and systematic errors into probability 
measures (Jackson, 1979; Gubbins, 1983; Tarantola, 1987; Backus, 1988a). Although this bound softening is 
now known to be unacceptable in spaces of infinite dimension, one might wonder whether it would work in 
spaces of finite dimension, like Z. Backus (1995b) shows that in finite dimensional spaces replacing an upper 
bound on a quadratic form by a normal probability measure introduces a spurious lower bound on the form. 
That is, the softening generates a claim that one can estimate the actual value of the quadratic form to within a 
certain factor at a certain error rate. For example, in three dimensions this bound-softening leads to a lower 
bound that is 1/20 of the upper bound at the 90% confidence level, and 1/3 of the upper bound at the 50% 
confidence level. 

These complications motivate the search for an inversion formalism able to combine hard and soft bounds on 
x E with random and systematic data errors in a way acceptable to both frequentists and Bayesians. The scheme 
should not require a commitment to either philosophy of probability, and should produce a result that could be 
interpreted in the light of either. Here we describe what we believe to be such a scheme. In the next sections 
we describe how we propose to use it in the geomagnetic inversion problem. 

Our formalism requires the contstruction of an object that we will call a truncated approximation (TA) to the 
inverse problem (X ,F ,G). A TA is a quintuple {X TA ,P TA ,F TA ,Q TA ,Gta ) in which X TA is a subset of a 
finite-dimensional topological space and the last four entries are functions. Their domains and codomains are 
as follows. 

P ta :X-*X T a, F ta ' %ta T . Qta’-Y—^Yta, Gj A : X TA — » Z , (2) 
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where Y TA = F TA (X TA ). We put no restrictions on these functions except to demand that Pj A and Qta be con- 
tinuous, that the restriction of Qta to Y TA be the identity map on Y TA and that F TA have a continuous inverse, 
F TA : y ta — ta • We define Hja = G ta ^ Fta o Qta nnd 

fTA = Fta O Pta -F , Sta = Fjja o Pta ~ G , StaY = /ta( x e ) » &taZ = 8ta(xe) ■ (3) 

The true value of the desired prediction is z £ = G ( x £ ). Our formalism leads to the conclusion that 
z E = zq-Sz where z 0 = H TA (y 0 ) and 

8 z = H TA (yo)-H TA (yo-8 R y-5sy+^TAy) + § 7 vtZ- (4) 

Equation (4) simplifies considerably when H TA is linear. It is the final result of the inversion and is offered to 
frequentists and Bayesians for their own interpretations. It describes the prediction error 5z as a combination of 
systematic and random errors. For example, if the prior information is a hard bound, x E e U E , then 
87-4 z e g T A ( U E ) , so 87 * z is a systematic error. If the prior information is a probability measure \i E on X then 
S M z is a random error with probability measure |i £ o gf A . Here gf A denotes the mapping from measurable 
subsets of Z to measurable subsets of X; the point mapping gf A :Z->X does not exist because dim Z <00 and 
dim X = Frequentists can convert the random errors in 8 z to confidence sets (hard bounds) and thus obtain 
a confidence set for 8 z (Backus, 1989; Donoho, 1990; Stark, 1992). Bayesians can convert systematic errors in 
Sz to random errors if dim Z< 2. If dim Z >3, softening hard bounds introduces enough spurious "information" 
that Bayesians will probably want to describe the error 8 z as the sum of a systematic error with known 
confinement set and a random error with known probability measure. 

Clearly, to apply the foregoing formalism to any particular problem requires effective computational use of 
the mathematical and physical structure of that problem. One useful tool in this endeavor is the notion of data 
compression. Suppose we have M linearly independent linear functionals on y, K‘:Y->R for / = 1, ■ ■ • ,M. 
(R denotes the field of real numbers.) Let Y be the space of real M -tuples and define the linear function 
K:Y->Y by K(y) = (K\ y), • • • ,K M ( y)). If (X TA ,P TA ,F TA ,Qta >Gta ) is a truncated approximation for 
the inverse problem ( X,F,G ) then (X TA ,P TA ,Ko F TA ,Ko Q TA ,G TA ) is a truncated approximation for 
(X ,Ka F ,G ). In the geomagnetic problem we will see later that the compression operator K can be chosen so 
that there are obvious bases in X TA and K( Y TA ) relative to which the matrix of the compressed data function 
K o F ta is nearly diagonal. This fact makes it easy to trace error propagation and speeds up the matrix inver- 
sions required to treat the real data. 

Another essential tool in applying the formalism to estimate the error 8 z in the prediction vector is the avai- 
lability on the data space Y of a dot product generated by the probability measure % for the random error 8 £ y 
(Backus, 1989). Relative to this dot product, the variance tensor of 8 £ y is the identity tensor on Y . This situa- 
tion is preserved under data compression. If the inverse problem (X ,F ,G) is replaced by a compressed ver- 
sion (X ,Ko F ,G) then the compressed data space Y has a natural dot product generated by the probability 
measure = T)« o AT -1 for the compressed random error 8 £ y = /f( 8 £ y). The compression mapping K is an 
isometry between Y and the orthogonal complement of the null space of K. 

On the infinite dimensional model space X , a natural dot product can be constructed from the prior informa- 
tion if X is a linear space and the prior information is a hard quadratic bound. Hard non-quadratic prior bounds 
may make X a Banach space or simply a topological linear space. If the prior bound on x £ is soft, a probabil- 
ity density p. £ on X , then it generates no natural dot product on X (Freedman, 1963; Backus, 1987). However, 
in any truncated approximation (X TA ,P TA ,F TA ,Q TA ,G TA ) to (X ,F ,G), the probability measure Jx £ o Pf/t 
generates a natural dot product on the finite dimensional space X TA . In the inversion, this dot product can be 
used in place of a dot product on X . 
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3. THE NEED FOR ERROR ESTIMATES IN GEOMAGNETIC MODELING: BACKGROUND 

The grant was focussed on the CMB, but the satellite and surface data cannot be analyzed without consider- 
ing together all five contributions to the magnetic field measurements: the core, the mantle, the crust, external 
electric currents and measurement errors. The inversion formalism described above seems to meet rather well 
the needs of this modelling project. 

Each of the five field sources can be modelled deterministically or stochastically, although deterministic 
models of measurement errors (eg a priori estimates of hard bounds) seem unlikely to be as useful as stochastic 
models in which a few parameters are inferred from the fit to the data. 

So far, the PI has tried to incorporate neither induced mantle currents nor external currents into the inversion 
process. There are some deterministic descriptions of both. McLeod (1992, 1994) and Pulkinnin et al (1995) 
discuss fields of external currents, and Backus (1982) and Benton and Whaler (1983) discuss mantle currents. 
In the final analysis of the real data, such descriptions will have to be incorporated, or replaced by stochastic 
substitutes. If the magnetic jerks (Ducruix et al, 1985) turn out to be real, they will provide a very useful upper 
bound on mantle conductivity and the contribution of mantle currents to the external field. McLeod (1992, 
1994) independently estimates deep mantle conductivity by focusing on the transfer functions from external to 
internal fields of various harmonic degrees. 

To date, the Pi’s inversion work has been restricted to the core, the crust and measurement errors. Recent 
advances in all three areas are described in the next section. In the remainder of this section we list some of 
the geophysical questions about the core whose answers depend on reliable modelling and credible error esti- 
mates. 

Gubbins and Bloxham (1987) believe that their inversion of the data resolves resemblances between magnetic 
features in the northern and southern hemispheres of the CMB that suggest the inner-core-grazing Taylor 
columns in Busse’s (1983) weak-field dynamo theory. On the other hand, Glatzmeier and Roberts (1995), in a 
remarkable simulation of the full dynamo equations, find a strong-field dynamo. Whether the patterns observed 
by Gubbins and Bloxham are required by the data depends crucially on the errors in estimating the radial field 
B r on the CMB. The errors quoted by Gubbins and Bloxham do not settle this issue, being simply an estimate 
of how well their model fits the data, not of how far it is from the real earth (Backus, 1988a; Stark, 1992). 

Another question addressed by B r on the CMB is the on-going controversy about whether, during periods 
shorter than one or two centuries, the core fluid velocity v affects B approximately as if the core were a perfect 
electrical conductor. When Roberts and Scott (1965) proposed this "frozen flux" approximation, they pointed 
out that it reduces the dynamo equation at the CMB to 3, B r + V s (B r v s ) = 0, where V s and v s are the 
tangential parts of V and v (in fact, v s = v at the CMB). This equation of Roberts and Scott has been the basis 
for almost all subsequent attempts to estimate v s at the CMB. These attempts employ satellite and surface 
magnetic data to estimate B r and d,B r on the CMB. Backus (1968) describes some ways to use those data to 
test the frozen flux hypothesis, but those tests require error estimates for the B r and d,B r inferred at the CMB. 
Bloxham and Gubbins (1986) believe the data require abandoning the frozen flux approximation. Again, how- 
ever, their error estimates are measures of how well their B r on the CMB fits the surface data, not how close it 
is to the real earth. Indeed, Constable et al (1993) have constructed frozen-flux models that do fit the recent 
data. The data before 1850 are only directions and produce an incompletely understood intrinsic non- 
uniqueness in determining B r even at the earth’s surface, much less at the CMB (Proctor and Gubbins, 1990). 

If the frozen-flux approximation is accepted, the Roberts and Scott equation extracts from B r and d,B r on the 
CMB information about \ s there. Backus (1968) described this extractable information as follows: there are 
two scalar fields, / and g , on the CMB such that there B r v s = V s / + f x V 5 g . Exact and complete values for 
B r and d,B r on the CMB determine / up to an irrelevant constant, and g can be chosen arbitrarily except that 
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V*g =fxV s / on the null-flux curves (where B r = 0). Since 1968 several physical hypotheses have been 
advanced to obtain more information about g . Voorhies and Backus (1985) suggested that v s might be approx- 
imately steady over one or two centuries (more precisely, if x was the time-scale for changes in v s and X* and 
X fi were typical length scales for \ s and B r then (v s x) -1 < X^-t-X^ 1 ). Bloxham (1990) suggested that the 
upper core might be so stably stratified that V s • v s = 0. LeMouel et al (1985) suggested that the tangential part 
of the Lorentz force just below the CMB might be much less than the tangential part of the Coriolis force, so 
that Vs ‘ (ft v s ) = 0 on the CMB, p being the cosine of colatitude. All three physical hypotheses remove much 
of the ambiguity in v s and Voorhies removes it all (Voorhies and Backus, 1985; Backus and LeMouel, 1986; 
Bloxham, 1990). Unfortunately, the data show that V 5 ■ v s and V 5 (pv s ) cannot both vanish, so they cannot 
be used together (Bloxham, 1990). To verify any of these physical hypotheses, one needs error estimates for B r 
and d,B r on the CMB or for B r at different epochs. 

Gubbins and Bloxham (1987) see in calculated maps of v s evidence of local upwelling just under the CMB. 
They suggest that this indicates local high temperatures in the lower mantle. Again, their error estimates are 
formal rather than substantive. 

It is the Pi’s conviction that believable error estimates for B r and perhaps d,B r at the CMB would either 
resolve many of these controversies or show that they are not resolvable with present data. 

4. PROGRESS ON GEOMAGNETIC MODELING TO DATE UNDER THE GRANT 


The Pi’s long-term goal is to use the satellite and observatory data to develop a defensible model of the 
sources of the geomagnetic field and the errors of measurement. As a first stage in this program, the existing 
models deserve careful study. Under NAGW-2967 the PI and his student, Andrew Walker, have begun a reex- 
amination of the power spectrum of the external field, as derived by Langel and Estes (1982) and by Cain et al 
(1989). We hope to leant enough from this study to undertake the second stage: deciding what parts of the 
model should be stochastic, what parts should be deterministic, and fitting the model to the original magnetic 
measurements so as to obtain predictions with error estimates about the geomagnetic field. Our interest is in 
the CMB, but the project must necessarily develop information about the other field sources. We hope the 
inferences about sources produced in this first stage will guide the second stage. Then those inferences must be 
rechecked against the results of the second stage. 

Most of our work under NAGW-2967 has been on the first stage. We have produced partial results for the 
core, the crust and the measuring errors. We describe each in turn. 

4A. The Core 

For the core the issue is to find credible and useful prior information. One hard bound is now well-known: 
the total ohmic heat production in the core is probably less than the heat flow out of the earth’s surface (Parker, 
1972; Gubbins, 1983; Backus, 1988a). There is a possibility that this bound could be exceeded (Backus, 
1975), so we propose also to study another hard bound we have recently obtained from the virial theorem: the 
total magnetic field energy must be less than the unsigned gravitational self-energy of the earth. Backus (1989) 
showed that the surface and satellite data can provide no information about the value of B r at single sites on 
the CMB if the prior information is a hard bound on heat flow or energy. Heat flow bounds do permit estima- 
tion of the flux of B r out of regions P on the CMB that have finite area and a smooth boundary curve dP , but 
the energy bounds do not It is not known whether an energy bound permits estimating the flux through P 
when B r - 0 on dP (ie if dP is a null-flux curve). 

Neither of the hard bounds just mentioned is tight, and only the dense distribution of satellite data makes 
them useful (Backus, 1989). The PI has recently found some evidence for a rather speculative soft bound that 
may have the advantage that observers who accept it can estimate point values of B r on the CMB, except 
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perhaps in a region of Lebesgue measure zero (vanishing total area). This is one of the theoretical questions 
yet to be addressed. At any rate, this new speculative soft bound is much tighter than the two hard bounds, and 
so deserves attention. Backus (1994b) discussed it at the 1994 fall AGU meeting, and that discussion is 
detailed in Appendix B. Here we give a summary. In the rest of subsection 4A., B will refer to the field of the 
core, uncontaminated by other sources. For reasons mentioned later, it is generally believed that at spherical 
harmonic degrees of 12 or less, the field we see at the surface is mostly the core field. 

The new bound on B is based on a fresh attempt to estimate the depth to the core magnetic sources via the 
observed decrease of energy with harmonic degree l in the LML spectrum, the spectrum introduced by Liicke 
(1957), Mauersberger (1956) and Lowes (1966). The attempt is suggested by the observation of Constable and 
Parker (1988) that the non-dipole gauss coefficients of the present geomagnetic field at the CMB, as far as they 
can be detected in the MAGS AT data, appear to be distributed like independent normal random variables with 
zero mean and variance dependent only on degree. Any random field with such statistics is statistically invari- 
ant under all rotations about the center of the earth, a property of the geomagnetic non-dipole field recently sug- 
gested also by Courtillot et al (1992). 

To describe what we have done and propose to do in this line of attack, we accept the following notation: if 
x is any random variable, £[jc] will denote its expected value. R and C are the fieldss of real and complex 
numbers, and if z e C then z* is its complex conjugate. The spherical surface of radius b concentric with the 
earth is denoted S(b). We will consistently take S ( a ) to be the CMB and S(c) to be a sphere near the MAG- 
SAT orbits. Cain et al (1989) take c = 6790 km for MAGSAT. If / :S(6)->C, then ( f)s(b ) is the average of 
/ with respect to area on S(b). For 0 </ <°o and -l <m< l let Y” be a real or complex spherical harmonic 
polynomial of degree / and longitudinal order m, normalized so that ( T/ m ( T* )* } S q) = &ir 5^ . For r>a let 
Y/ m (r) be the coefficients in the expansion 

Br ('?) = £ E Tf 00 17(f), (5a) 

1=1 m=-l 

where r is an arbitrary unit vector. The Schmidt semi-normalized gauss coefficients g”(r), the coefficients 
usually given in published reference models, are related to the y” hy the equation 

(21 + 1) I/2 y”(r ) = (/ + 1) g”(r ) . (5b) 

The pre-Maxwell equations assure that r l+2 y, m (r) is independent of r. On S(r) the LML spectrum is 

R(r,l) = (l + 1) £ I gr(r)\ 2 . (5c) 

m=-/ 

There seems to be no justification for the factor ( / + 1) in (5c) except habit and the independent demonstrations 
by Liicke, Mauersberger and Lowes that R(r,l ) = (B 2 ) S ^) where B, is the part of B produced by spherical 
harmonics of degree / . This looks like a physically significant quantity, but so far no dynamo theory has sin- 
gled it out. The statistical model we are about to consider suggests that a different factor would be physically 
more useful in (5c), a conclusion reached also by Jackson (1994a). 

Constable and Parker (1988) observed that if 2</<12 and -l <m< l then the present-day values of the 
quantities ( 2/ +1)( / -t-l) -1/2 y/" ( a ) pass the Kolmogorov-Smimov (K-S) test for independent identically distri- 
buted (iid) normal random variables. Andrew Walker found the same result when he multiplied the Yf(a) by a 
few other factors algebraic in / , so the PI decided to look for a model with more structure, one that might jus- 
tify a particular choice of factor. A physically reasonable place to start seemed to be the non-dipole part of B r , 
because the uniqueness assertion in the Neumann problem makes the external B a direct expression of B r just 
above the CMB, and this B r is the same as the B r produced by the core dynamo just below the fluid boundary 
layers there. The result of Constable and Parker shows that, at least for 2<l < 12, B r can be regarded as a ran- 
dom field on 5 ( a ) whose statistics are invariant under all rotations about the center of the earth. Their choice 
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of multiplier for y”(a) amounts, as we shall see, to the choice of a particular covariance function for B r (r) on 
S(a). It appeared reasonable to look first at the simplest covariance function, the delta function. That is, we 
decided to explore the hypothesis that there was a spherical surface S(w) on which B r was gaussian white 
noise. This is equivalent to assuming that the yP( w ) are iid gaussian with the same variance, which we label 
a 2 . We left w as a free parameter because the depth to the sources is often estimated by fitting a straight line 
to the observed ln/?(c,/) defined by (5c) (Langel and Estes, 1982; Cain et al, 1989), and one might hope that 
fitting the data with our model would produce w =a . 

Our rather naive model makes a definite prediction about the LML spectrum. Since Y/ m ( c) = 
( w/c) ,+2 y/"(w), then from (5b,c) we have 

R(c,l) = (21+l)(l+iy 1 (wlcf^ £ lYfC*)! 2 . 

m=>-/ 

Our model makes R(c,l) a random variable, and our hypothesis that B r is white noise on S(w) leads to 

ln/?(c,/) = ln[2(w/c) 4 a 2 ] + / ln[(w/c) 2 ] + ln(2/+l)-ln(/+l)+C, 

where exp(2£,) is a chi square random variable with 2/+1 degrees of freedom. The expected value of is 

£[£ / ] = xj/( / + Vi) and its variance is V[£ ; ] = 3,vy(/ + 1A), where is the digamma function, \y(z) = 
d, lnT(z) (Abramowitz and Stegun, 1964). Thus 

E (In R(c,l)] = In A +/ lnv + 2(/+2)ln(a/c)+ln(2/+l)-ln(/+l)+£ [£, ], (6a) 

V[ln/?(c,/)] = V[C,] = a i V(/+ 1 A) (6b) 

where v = (w/a) 2 and A =2v 2 o 2 . 

In the range 2<l <12, where on S(c) the non-dipole field of the core may be relatively uncontaminated by 
the crust, our model makes ln/?(c,/) a random variable with variance V [£, ]. The obvious course is to esti- 
mate the free parameters A and v from the observed values of In/? (c,/) by choosing A and v to minimize the 
weighted mean square error, 

XV(C/r 1 {ln/?(c,/)-£[ln/?(c,/)]) 2 . (7) 

1=2 

Cain very kindly made available by ftp the gauss coefficients used in Cain et al (1989), and with these we com- 
puted the A and v which minimize (7). The minimizing values are v = 0.766 ±0.031 and A = (4±2)x 10 9 nT 2 . 
The error estimates are from the statistical model. The model predicts that the residual (7) is a realization of a 
random variable distributed approximately like chi-square with 1 1 degrees of freedom. The actual value of (7) 
with Cain’s gauss coefficients is 7.01. Andrew Walker finds that the best-fitting v and A produce values of 
Yi "( w)/ct for 2<l < 12 and -l <m< l that do pass, at the 60 per cent confidence level, the K-S test for iid nor- 
mal random variables with mean zero and variance 1. Since ct was calculated from the sample, the original 
analytic K-S test was replaced by a Monte Carlo calculation (Lilliefors, 1967; Mason and Bell, 1986). The 
uncorrected analytic K-S test gave a confidence level of 70 per cent 

With v = 0.766 we have a-w = 436±60 km. If there really were a sphere S(w) on which B r is white 
noise, it would be well below the fluid boundary layers at the CMB, and we could not use the pre-Maxwell 
equations to "see" it from S ( c ). Our model must be abandoned. The calculations it generated, however, have 
another, more plausible interpretation obtained by extending to the CMB a proposal of Jackson (1990, 1994a) 
for stochastic modelling of the crust. If there were a white noise sphere S(w) with w <a, then on S(a) the 
field B r would be random gaussian with mean 0 and variance given by either of 

E [yr(a)y?'(a)* ] = a 2 v !+2 8„ 8 W (8a) 

E[B r (ar)B r (aS)] = o 2 v 2 f i (2/+l)v' P,(f S) 

1=2 


(8b) 
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where Pi is the / -th Legendre polynomial. If B r (at) is a random non-dipole field on 5(a) with mean 0 whose 
statistics are invariant under rotation and whose covariance function is (8b), this field will produce exactly the 
same statistics on 5(c) as does the non-dipole field whose statistics are white noise on 5(w). We interpret our 
fit of (6a) to the data on 5 ( c ) as an indication not that there is an 5 ( w ) below the CMB on which B r is white 
noise, but rather that B r behaves on 5(a) statistically like a rotationally invariant random non-dipole field with 
mean 0 and covariance function (8b). 

Any covariance function for B r on 5(a) that depends only on t •§ specifies the statistics of B r as a normal 
random field on each 5(r) with r>a. The Laplace equation assures that the resulting covariance functions for 
r>a constitute a semigroup (Hille and Phillips, 1957) whose generator is (8b) with a = 1 and with the lower 
limit in that sum replaced by l =0. Its relationship to the semigroup’s generating function makes (8b) an alge- 
braically convenient choice for the covariance function of B r on S ( a ). 

As Jackson (1990, 1994a) is careful to point out for the crust, (8b) for the CMB is probably a rather crude 
representation of the true covariance function. The only properties of that function captured by (8b) are its 
amplitude and its correlation length. The series can be summed analytically (Backus, 1986, used by Jackson) 
and describes a narrow positive peak with broad flat negative wings (see Appendix B, Fig. 9). The half height 
of the peak occurs at a half width of about 1 - v radians. A precise calculation for v = 0.766 gives a half width 
of 12 degrees, corresponding to a correlation length of 750 km on the CMB. If this picture survives, we have 
captured a statistical property of the geodynamo that, in the past, has been interpreted as an anomalously great 
depth to the sources. There is an alternative to our suggestion. Cain et al (1989) have brought S(w) nearly to 
the CMB by correcting the core LML spectrum for the crustal contribution and by choosing for their random 
function a Funk-Hecke transform (Backus, 1986) of B r . This transform seems to us less natural than B r itself 
as a white noise candidate for locating source depth, although it does have tradition in its favor. 

Our whole picture needs further examination, some of which we describe below, and which we propose to 
carry out as the work continues. If the picture does hold up, and if we are willing to extrapolate it to degrees 
higher than 12, a preliminary study indicates that the prior information it provides will permit the estimation of 
point values of B r on 5 ( a ), except for a subset of 5 ( a ) with zero Lebesgue measure (area). 

4B. The Crust 

Because of a knee in the LML spectrum around harmonic degree / = 15, Langel and Estes (1982) and Cain et 
al (1989) suggested that the magnetic signal outside the earth is dominated by the core for / < 15 and by the 
crust for / > 15. One group of workers (Meyer et al, 1983, 1985; Hahn et al, 1984) modelled the crust by sub- 
dividing it into geological provinces of different susceptibilities. They could fit the LML spectrum in the range 
of harmonic degrees 15 </ <35. A second group (Counil et al, 1991) produced a good fit for 15 </ <60 by 
using only two provinces, continents and oceans. Jackson (1990) pointed out that neither model fits the actual 
values of the gauss coefficients in those ranges of l . 

If a geological model of the crust were found to fit the satellite data in the range of / where those data are 
dominated by the crust, one could plausibly extrapolate the gauss coefficents of that model down to / = 1 and 
thus isolate the core gauss coefficients in the range where they obscure the crust. In the absence of such a 
deterministic model of the crust, Shure et al (1985) treated crustal signals on 5(c) as systematic errors. Backus 
(1989) observed that tighter error bounds on predictions could be obtained by treating the crustal "errors" as 
random. Backus, however, took the crustal signals measured at different points on 5(c) to be independent, 
thus assuming that the correlation length of B on 5(c) was zero. Jackson (1990) pointed out that a random 
crustal magnetic signal on 5(c) is produced by a random magnetization M in the crust, and that even if the 
horizontal correlation length of M in the crust were zero, the geometry of the Laplace equation would produce 
a correlation length for B on 5(c) at least as large as c-b , S(b) being the surface of the earth. 

A random error in B on 5(c) with an appreciable correlation length complicates both the task of numerical 
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inversion and the attempt to construct simplified models whose behavior gives insight into the numerical results 
of the real data inversions. The problem is that most inversion schemes invoke the inverse of the variance 
matrix of the random errors. The inversion scheme in the first section of this proposal uses that inverse in 
order to calculate the natural dot product on the data space. As Jackson (1990) points out, the variance matrix 
of the random crustal signals at the measuring sites on S ( c ) will be a full (non-sparse) matrix when the corre- 
lation length of 8*B is appreciable. In geomagnetic modeling, numerical inversion of non-sparse matrices is 
quite feasible on modem computers, but there are calculations in which one might want the singular value 
decomposition of such a matrix (Backus, 1989). These do begin to be time-consuming. 

Jackson (1994a) has found a physically plausible class of simple stochastic models for M which partly allevi- 
ate the problem of a non-sparse error variance matrix. These models do generate correlations between measure- 
ments of B at any two sites on S(c), but at least they do not generate correlations between different y, m ( c ) in 
(5a). Jackson’s idea is to posit a stochastic M such that for any positions r = rf and s = s§ in the crust 

E[ M(r)] = 0 and (9a) 

£[M(r)M(s)] = 8(r-j)F(f'8)I (9b) 

where I is the three-dimensional identity tensor. Equation (9b) is an approximation based on the assumption 
that the vertical correlation length X of M is much shorter than h , the thickness of the magnetized crust Jack- 
son shows that his conclusions are essentially unchanged if X»h, so that 8(r -s) is replaced by /T 1 . 

To exploit (9b), Jackson defines coefficients F, by writing the Legendre expansion of F in the form 

FOt) = L(2/ + l)F, ^(lO (9c) 

1=0 

where P t is the Legendre polynomial of degree l . Clearly (9a) implies 

E[yr(c)l = 0. (10a) 

Jackson shows that if the magnetized part of the crust lies between the spherical surfaces S(b) and S(b-h) 
then for 1,1' c blh equations (9) imply 

E [ yf( b ) Yr*( b)*] = nZhb~ 2 l F w 8, r 8 W (10b) 

where Po is the magnetic permeability of the vacuum. 

Jackson proposes a simple preliminary choice for the F(p) in (9), namely that 

F,=Kv‘ (11) 


where K and v are positive constants and v< 1. One advantage of this choice is that the series (9c) can be 
summed explicitly (Backus, 1986) to give 

F(p) = K (1-v 2 ) (l + p 2 -2pv) -:V2 . (12) 

Jackson’s F has two free parameters, K which determines its amplitude and v which determines the width of 
its peak. If we write p = cos 9 then 9 = 0 at the center of the peak, and at half peak height, very roughly, 
9 = 1-v radians if v is close to 1. Thus Jackson’s Ansatz (11) describes the two most important features of 
any covariance function F . 

Many features of Jackson’s model (9) are more general than first appears. Schur’s lemma (Weyl, 1950, 
pi 52) shows that if M is any random magnetization of the crust whose statistics are invariant under all rotations 
about the center of the earth, then on S(c) 


F[y, m (c)] = 0 


and 


E [yr(b)yp'(b)* ] = 4>, 8 ;r 8^ 


(13a) 
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for some constants <}>;. The PI has shown that if M has rotationally invariant statistics, then E[M(r)M(s)] has 
four independent components that produce magnetic signals on S(c). If their vertical correlation lengths are all 
much less than the thickness of the magnetized crust, then one of them is (9b). If all four have roughly the 
same horizontal correlation function, and if 15 «/ cblh, laborious algebra shows that, approximately, 

<t U=KIF,^ (13b) 

for a suitably chosen constant K. Therefore, if the statistics of the crust can be treated as rotationally invariant, 
Jackson’s model (9) seems likely to be adequate. 

Although Jackson’s model leads to a non-sparse error variance matrix for the data, it does diagonalize the 
variance matrix of the gauss coefficients. We can exploit this by an appropriately chosen compression of the 
data. Define vector spherical harmonics on 5 ( 1) as follows: 

Ur(f) = [-Vr-'- 1 //r(f)] r=1 ; Vf(f) = [V//, m (r)], =1 ; W ,"(f) = [rxV//,"(r)] r=1 . (14) 

Then (u r-(vr> ) S(l) = <ur-(w?y > S( i) =(vr-(wr'r > s <,> = o, <ur-(ur> > S0) = (/ +n8, r s w , 
(vr-(vp')* } sm = l s„ 8 mm - , (wr-(wp')* ) S(1) = / (/ + 1)(2/ + 1 r 1 S„ 8 W . On 5(c) the magnefic field is 

B(C?)=f; £ [ g, m ( C ) U; m ( f ) + h,”( c ) V/*( f) + *,"( C ) W, m ( f ) ] (15) 

(=1 m=-l 

where the terms in U, V and W describe respectively the fields generated by currents inside, outside and cross- 
ing 5(c) (Stem, 1976; Kosik, 1984; Backus, 1986). Backus shows that the g/"(c) are the Schmidt semi- 
normalized gauss coefficients. From the orthogonality relations for the vector spherical harmonics (14) 

(/ + l)gr(c) = (4jc)- 1 f <M(f)B(c?)-ur(f)*. ( 16 ) 

S(l) 

Equation (16) suggests a way to compress the data. Suppose B has been measured at sites r, , • • • ,r 0/3 , so 
that the data space Y has dimension D . Write r, = r t i, and assign to each unit vector r, a small patch co, on 
5(1) with area 4 jc I to, I. We ask that r, e to, and that every f e 5(1) belong to exactly one to,. Choose an 
integer L for which L(L +2) < D/3, and for all l, m such that 1 < / <L and -l <m<l define g/"( c ) by 

D/3 

(/ + D gr(c) = £ Ico. i (ri/c) ,+2 B( r, ) ■ U/"( f,- )* . (17) 

i= 1 

Evidently gT(c) depends linearly on the data. Usually, the g/"(c) for different / and m will be linearly 
independent functionals on Y. Taken together, they will produce a data compression K:Y—>Y, where Y is the 
space of L(L+2)-tuples of complex numbers (or real numbers if the harmonics H” were chosen to be real). If 
the observation sites r, are all nearly on and more or less evenly distributed over 5(c), and if the B in (15) 
does not include measurement error, g”( c ) is likely to be a reasonable approximation to g/"( c ). Then the sta- 
tistical properties of the contribution to g™(c) from a rotationally invariant random crust will be approximately 
(10). We have begun a study of the accuracy of this approximation, but much remains to be done. Preliminary 
results suggest that the approximation will be acceptable for L <45. For higher L it appears likely that the data 
are dominated by the errors of measurement (Cain et al, 1989). 

Langel et al (1982a) have shown how to use satellite data at different epochs to eliminate local crustal 
anomalies from data taken at magnetic observatories. The availability of satellite vector data at two different 
epochs (MAGSAT adn OERSTED) makes the proposal of Langel et al particularly useful. 

4C. The Errors of Measurement 

The simplest assumption about the errors in the D observations of components of B at the D 13 locations 
c?i , • • • , cffl /3 is that those errors are iid gaussian with mean 0 and variance cr 2 . Then the errors of meas- 
urement, B, are governed by the statistics 
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E [8j? B(cf/)] = 0 £[8 Jf B(cf,)8^B(cf / )] = a 2 lS // . (18) 

If we want to use the data compression (17) in order to deal with the non-zero correlation length of the crustal 
signal, we must also see how that compression affects the errors of measurement. Let 8^g, m (c) be the error in 
g/"(c) produced by the measurement errors. Then equation (17) implies 

D/3 

(Z + 1)(Z' + 1)E [S R gr(c)S R g?Xc)] = c 2 £ I to, I 2 • (19) 

«=o 

If the I to, I are all nearly the same, they are all nearly Dt 3. Then we can approximate the sum in (19) by an 
integral over S (1) and obtain 

( / + 1) E [ 8* £/"( c ) 8* gf( c ) ] = a 2 (D /3)" 1 8, f 8 W . (20) 

Holme and Bloxham (1995) have made a very important improvement in modelling the statistics of the 
errors of measurement. They point out that B( c ?,) consists of three parts: magnetometer errors, satellite 

tracking errors, and satellite orientation errors. The preceding paragraph describes only the magnetometer 
errors. Holme and Bloxham note that tracking errors may introduce correlations for i *j , but they have not yet 
studied these. For small orientation errors in which all three Euler angles at each site are iid with variance 0 2 
radians 2 , they have shown that 

£[8*B(cf,)] = 0, £[8*B(c^)8*B(cP y )] = 0 2 (fi 2 I-BB)8 iy , (21) 

where B is the geomagnetic field at c f, . The full error-variance tensor for the compressed data g”( c ) is the 
sum of (18) and (21) plus whatever contribution comes from tracking errors. It is easy to show that, if B is 
approximated by the axial dipole field of the earth, then the effect of (21) on the errors in the compressed data 
(17) is to introduce nonzero correlations between 8* g/"( c ) and Spg” 2 (c) for all / and m. 


4D. A First Stage Synthesis 

Cain et al (1989) concluded that the anomalously large depth below the CMB previously found for the mag- 
netic sources in the core (174 km, Langel and Estes, 1982; 147 km, Meyer et al, 1983) could be largely 
removed by including the crustal contribution to the LML spectrum R(c,l) at all spherical harmonic degrees Z, 
even those so low (Z < 14) that the core signal dominated the satellite data. This observation by Cain et al has 
the added attraction that it suggests we can use extrapolation to estimate the crustal contribution to R(c,l) in 
the range of Z dominated by the core. On the other hand, the statistical model we propose in section 4A calls 
for an apparent source depth 436 km below the CMB. This is so large that we prefer to interpret it as 
representing a statistical regularity of B r on the CMB rather than a white noise source for B r below the CMB. 
Our model is supported by the fact that the gauss coefficients with 2 < Z < 12 do pass the Kolmogorov-Smimov 
(K-S) test for randomness posed by the model. However, the model described in section 4A has no crustal 
correction, and the work of Cain et al (1989) suggests that such a correction may be important This explora- 
tion is part of the work we plan for the future. We have done some preliminary calculations, which we set 
forth here. 

If we omit the correction for satellite orientation developed by Holme and Bloxham (1995), the statistical 
models in sections 4 A, 4B, 4C predict that the LML spectrum R(c,l) is the realization of a random variable 
whose expected value varies with / like 

E [£(c,Z)] = (2Z + l)[(2Z+l)(Z + 1) -1 A a' + l(l + l)B p' + C] (22) 

where A ,a,B,j),C are contstants, A and a coming from the core, B and P from the crust and C from errors 
of measurement. On the other hand, Cain et al (1989) fit the observed R(c,l) with the traditional expression. 
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R(c,l) = A a 1 +B p' + C. (23) 

We have made some preliminary fits of (22) to the observed R(c,l) reported by Cain et al (1989). We con- 
sidered two ranges of /: 2<l <40 and 2</ <59, and we used two different fitting techniques: maximum likeli- 
hood and weighted least squares with weights inversely proportional to the variances predicted by our model. 
If S(w) is the apparent "white noise sphere" in the core and S(a) is the CMB, the crustal correction reduces 
the apparent source depth a-w from 436 km to between 240 km and 320 km, depending on the range of l and 
the fitting method. These values still seem too large to permit accepting S ( w ) as physically real. Even after 
the crustal correction we maintain our view that what we see in the LML spectrum is not a white noise source 
but rather the statistical regularity of B r on S ( a ). The crustal correction reduces the correlation length of B r 
on S(a) from 750 km to between 370 km and 500 km, depending on the range of / and the fitting method. 

Fitting (22) to the observed satellite power spectrum also gives values for the crustal parameters B and P and 
the parameter C describing instrument error. For the crustal magnetization, we find a horizontal correlation 
length which lies between 210 and 540 km, whereas Jackson (1994a) found 50 km. Our results are preliminary, 
and we are not even sure yet that our program is reliable. However, Jackson used the model of Cain et al 
(1989) described by (23), which may differ enough from our model (22) to explain the difference in correlation 
lengths. We should also note (A. Jackson, private communication) that Jackson tested the randomness of Cain 
et al’s (1989) individual g ( m (c) in 16 <45 by normalizing them with Cain’s values of R(c ,1). This amounts to 
determing 30 parameters of the distribution from the data, and may require a Monte Carlo amendment of the 
direct K-S test of randomness used by Jackson. Perhaps he will repeat his test, and we plan to do so. Our own 
randomness test fails at the moment. When we divide Cain’s gauss coefficients by their standard deviations, as 
derived from the best fitting (22), the normalized coefficients fail the gaussian K-S test for 2<L if L is much 
larger than 20. Preliminary computations indicate that the normalized gauss coefficients with 2< / <40 pass the 
test if we delete those with 15 </ <25. Jackson (1994a) raised the possibility that the crust may not be well 
described by a rotationally invariant stochastic model. Barring program errors, we may be seeing indications of 
what he suggested. There may be crustal randomness above / = 15 and geography below. This whole area 
needs much more work. 

It remains to discuss the C obtained in the fit of (22) to the observed satellite LML spectrum. As noted in 
section 4C, if we omit the treatment of satellite orientation errors developed by Holme and Bloxham (1995), the 
value of C should be approximately a* (D/3)~\ where a 2 is the variance of the error in measuring one com- 
ponent of B at one site, and D is the number of such components in the data vector. Cain and Langel (private 
communications) both suggested that D ~ 50,000 was a reasonable estimate for MAGSAT. Then our C ’s gave 
values of cr between 3 and 7 nT, depending on the range of / and the fitting method. Langel et al (1982b) 
estimated a by a quite different procedure: combining engineering estimates of the various sources of error in 
the instruments and the satellite orientation (which they assumed would produce an isotropic random error in 
B). They found a = 5.8 nT. 

Our preliminary results suggest a number of lines of investigation in the first stage of the work. We must 
explore our numerical techniques very carefully, because both of our penalty functions appear to have several 
minima. Considerable exploration of parameter space will be required before we can have any confidence that 
a rotationally invariant stochastic model of the crust must be abandoned. If that turns out to be the case, we 
must look for anisotropic stochastic crustal models. If that search fails, we must determine how much of the 
crustal signal can be modelled stochastically and how much requires deterministic modelling. A statistical 
model of the crust or a deterministic geological model would lend themselves to extrapolation to the low l 
where the core is visible magnetically. It is hard to see how, by themselves, deterministic gauss coefficients in 
the range / > 15 could do so, even if we were confident that they represented the crust. 

It would be very interesting to see whether we can detect in the data the difference between (22) and its 
modification when corrected for Holme and Bloxham’s (1995) treatment of satellite orientation error. Such 
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detection seems unlikely with existing gauss coefficients, since their authors mistrust them above / =40 (Cain 
and Langel, private communications), while the crustal signal appears to dominate the measurement errors 
below / =50 (Arkani-Hamed and Strangway, 1986; Cain et al, 1989). 

Perhaps the most interesting consequence of a convincing statistical model for the "visible" gauss coefficients 
of the core (2 < / < 12) would be that the boldest observers might be willing to extrapolate this core model to 
higher / , where the core is concealed by the crustal signal or measurement noise. Such a statistical core model 
would provide a soft prior bound on the deterministic core model x E that was much sharper than either of the 
hard prior bounds, ohmic heat production or magnetic energy. As already mentioned, this soft prior bound may 
be strong enough to permit estimation of B r at most points on the CMB, a possibility foreclosed by both hard 
prior bounds. 

5. FUTURE PLANS 


The first stage of this project involves working with the gauss coefficients already available in the literature, 
in order to gain some understanding of whether the soft prior bound on x £ is defensible, whether the crustal 
signal can be treated as partly or fully stochastic, and what is the structure of the errors of measurement. In the 
second stage of our work, we hope to generate our own gauss coefficients from the MAGSAT and OERSTED 
meaurements of B in near-earth orbit. (The PI has been accepted as an OERSTED investiagator.) An essential 
part of this process will be to use the inversion scheme described in Section 2 to estimate the systematic and 
random errors in our gauss coefficients. In our truncated approximation to the geomagnetic inverse problem, 
the truncated model space X TA will usually be the space of gauss coefficients up to some maximum degree, 
although Constable et al (1993) have found it useful in calculating flux on the CMB to represent the core field 
there by a continuous, piecewise-linear approximation to its radial component. Our inversion formalism applies 
to any method for approximating the field sources, and we plan to try several. 

In addition to the satellite data, our second stage inversion will use what we have learned in the first stage 
about possible models for the core field, the crustal field and the errors in measuring B. The second stage will 
also require modelling fields from the ionosphere and magnetosphere, since they do contribute measurably to B 
(Langel and Estes, 1985; Cain et al, 1989; McLeod, 1992, 1994). 

MAGSAT measured all three components of B, and that is also the plan for OERSTED, so the part of our 
inversion which produces the gauss coefficients can be linear. The estimated errors in the MAGSAT com- 
ponent data were three times those in the intensity data (Langel et al, 1982), so we will try to construct a trun- 
cated approximation to the inverse problem which uses the intensity data as well as the components. We plan 
first to try iterative schemes based on linearization, as is usually done in non-linear least squares. Component 
data must be included when using the intensities because it is known that with intensities alone there is no use- 
ful truncated approximate of the inverse problem (Backus, 1970b). 

In estimating core gauss coefficients in the second stage we propose to reexamine the literature on magnetic 
jerks and long-period magnetic induction (Langel and Estes, 1985; Ducruix et al, 1980; McLeod, 1992, 1994) in 
the hope of estimating an upper bound on the electrical conductivity in the lower mantle. Previous work (Ben- 
ton and Whaler, 1983) suggests that mande conductivity may not much affect main field models, and that its 
main effect is simply to delay core signals by perhaps a decade or less in their upward passage from the CMB 
to the surface (Backus, 1982). It is conceivable that the two satellites will give us error bounds on the core 
gauss coefficients tight enough to justify including mantle currents in the inversion. 

Among the questions we hope to answer in the second stage are these: what is the highest degree l at which 
the satellite data give us whole earth gauss coefficients larger than their error estimates? Do these gauss 
coefficients support the conclusions reached in the first stage of the work? For example, does our statistical 
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model of B r at the CMB remain viable, perhaps with some changes in parameters? Can part of the crustal field 
be treated stochastically, and, if so, is it a part which admits plausible extrapolation of its statistics to the range 
of harmonic degrees between 1 and 14, where the crustal signal is obscured by the core. If so, then we can at 
least estimate how much random error the crust contributes to the core gauss coefficients, even if we cannot 
separate the core and crustal parts of those coefficients. If the crust cannot be treated stochastically in the range 
of degrees where we see the core coefficients (1 < / < 14 or less), we can try to get hard bounds on the errors in 
the core coefficients from hard bounds on crustal magnetization, or we can join the attempts to find a deter- 
ministic geological model. At the moment we have no good ideas for this latter project, but other workers are 
pursuing it actively (Whaler, 1994). 

Since the MAGS AT and OERSTED satellite data will be separated by at least 15 years, we hope that our 
enror bounds at the CMB will be tight enough to throw light on some of the outstanding theoretical controver- 
sies which involve predictions about time behavior. Can null-flux curves be seen to appear or disappear, or can 
the flux through them be estimated accurately enough, to rule out the frozen-flux hypothesis of Roberts and 
Scott (1965)? If not, can the areas inside null-flux curves be estimated accurately enough to test their con- 
stancy in time, as required by Bloxham’s (1990) suggestion that the core fluid velocity just below the boundary 
layer at the CMB might be toroidal? Can the areas inside the projections of null-flux curves onto the equatorial 
plane be estimated accurately enough to test their constancy in time, as required (Jackson, 1994b) by the propo- 
sal of Le Mouel et al (1985) that the fluid velocity just below the CMB boundary layer is radially geostrophic? 
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Truncation, Approximation and Procrastination as Inversion Techniques 

George e. backus 

Institute of Geophysics and Planetary Physics, University of California, San Diego 

By examining the processes of truncating and approximating the model space, and by committing to 
neither the objecdvist nor the subjectivist interpretation of probability, we construct a formal scheme for 
solving linear and nonlinear geophysical inverse problems. The necessary prior information about the 
correct model x £ can be either a collection of inequalities or a probability measure describing where x £ 
was likely to be in the model space X before the data vector yo was measured. The results of the 
inversion are i) a vector Zo that estimates some numerical properties z £ of x £ ; ii) an estimate of the 
error 5z = Zo~z £ . Since y 0 is finite dimensional, so is Zo, and hence in principle inversion cannot 
describe all of x E . The error 8z is studied under successively more specialized assumptions about the 
inverse problem, culminating in a complete analysis of the linear inverse problem with a prior quadratic 
bound on x £ . As an idealized example we study the magnetic field at the core-mantle boundary, using 
satellite measurements of field elements at sites assumed to be almost uniformly distributed on a single 
spherical surface. 


1. Introduction 

As part of a project for inverting data from the magnetic 
satellites MAGSAT and OERSTED, the authors have 
developed a general formalism for estimating errors in geo- 
physical inverse problems. This formalism applies to non- 
linear as well as linear problems, to regularization (Tikho- 
nov, 1963), to hard and soft prior information, and to both 
Bayesian and ffequentist treatments of that prior informa- 
tion. The formalism appears to include and unify many of 
the techniques used in the last 25 years of geomagnetic 
inversion (Franklin, 1970; Backus, 1970a,b,c, 1988a, b, 1989; 
Jackson, 1979; Langel, 1982; Gubbins, 1983, 1984, 1985; 
Tarantola, 1987; Cain et al, 1989; Stark, 1992; Donaho, 
1992). In the present paper we describe and illustrate this 
formalism in the hope that it will be of interest to others 
working with data inversion. 

The formalism shows how to find error bounds, but does 
not by itself assure that they will be small enough to be 
interesting. Therefore we emphasize at the outset that we 
have not yet applied the formalism to any non-linear prob- 
lems. Non-linear inversions are usually problem-specific and 
often involve deep issues of'hard" analysis. We have exam- 
ined none of these issues. Our reason for describing the for- 
malism in enough generality to cover non-linear problems is 
that, as often happens, generality simplifies. 

Our illustration of the formalism will be an idealized 
linear MAGSAT inversion rather like that in Backus (1989). 
The idealized measurement sites are assumed to be almost 
uniformly distributed on a single spherical surface about 420 
km above the earth’s surface. As even this idealized linear 
example will illustrate, the investigator has a wide choice of 
ways to implement the formalism. There are, however, 


some aspects of the formalism that can be developed in con- 
siderable detail for general linear problems. 

Since the formalism aims to be useful to both Bayesians 
and ffequentists, we begin by describing how we use those 
terms. Frequentists (objectivists) hold with Neymann (1937) 
that probability has no experimental meaning except as an 
estimate of frequencies of various outcomes in a repeatable 
series of random trials. Bayesians (subjectivists) hold that 
probability distributions can serve as quantitative descrip- 
tions of their personal beliefs. Objectivists have sought to 
construct multidimensional confidence sets for a collection 
of numerical predictions about the correct earth model x E 
(Backus, 1989; Stark, 1992). Subjectivists have imposed 
prior personal probability distributions (ppd’s) on the model 
space X to describe their prior beliefs about the location of 
x E in X . Then they have used the classical theory of condi- 
tional probability (Bayesian calculus) to compute posterior 
ppd’s for x E from the data and the prior ppd’s (Backus, 
1970a, 1988a; Jackson, 1979; Gubbins, 1983; Tarantola, 
1987). Subjectivists sometimes introduce a prior ppd that 
purports to be a probabilistic imitation of some generally 
accepted bound on x E . Backus (1988b) showed that such an 
imitation introduces spurious prior "information” about x E 
that is not implied by the bound being imitated. Appendix 
A gives the details for gaussian probabilistic imitations of 
quadratic bounds. 

In the present paper we take no position on the question 
of whether to interpret probability objectively or subjec- 
tively. We procrastinate on the philosophical issue, and 
leave it to each camp to interpret our final results according 
to their own views of probability. 

We will need some of the notation used in the theories of 
sets, functions, measures and Hilbert spaces. Most of this is 
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in MacLane and Birkhoff (1967) and Halmos (1950, 1951, 
1958). For completeness, we list it in Appendix B. 

2. THE GENERAL INVERSE PROBLEM 

We give here a concise description of a typical geophysi- 
cal inverse problem. It provides us with D real numbers, 
y 1 , • • ■ ,y D (the data). From the data we try to estimate P 
other real numbers z 1 , • • • ,z p (the predictions). Both the 
data and the predictions could be calculated exactly from 
the correct earth model, x £ , if we knew it. However, it is 
an unknown member of a known model space X, ususally 
infinite dimensional. We introduce two finite-dimensional 
real vector spaces, Y and Z, the data space and the predic- 
tion space, consisting respectively of all ordered D -tuples 
and all ordered P -tuples of real numbers. We are given a 
data function F:X->Y and a prediction function G:X->Z, 
whose coordinate functions we denote by F‘:X-»R and 
G':X— »R, where l<i <D and 1 <j <P . If there were no 
errors, the observed data would be the entries y‘ E = F‘(x £ ) 
in the D -tuple y £ = F(\ E ). We want to find the entries 
2 E j = G'(x £ ) in the P -tuple z £ = G(x £ ). It is crucial to 
recognize the futility of trying to find x £ itself. We cannot 
record infinitely many real numbers, much less compute 
with them. Of course z £ may consist of P of the infinitely 
many parameters required for a complete description of x £ . 

The data include both a random error 8 £ y and a sys- 
tematic error 8 s y, so that the measured data vector y 0 is 
related to the error-free data vector y £ = F(x £ ) through the 
equation 

yo = y£+8 £ y+8 s y. (2.1) 

By definition, an error is not known exactly, but if we know 
nothing about it then the data are useless. Systematic and 
random errors are distinguished by the kinds of partial infor- 
mation we have about them. For the systematic error 8 s y 
we know only that 

8 s y€V s ( 2 . 2 ) 

where is a known subset of the data space Y . We will 
call V s the confinement set for 8 s y. Often Y has a norm, 
|| ||, and V s is the solid ball By (a) with radius a centered 
on 0 in Y. Then (2.2) becomes simply ||8 s y||<a. Extend- 
ing Jackson’s (1979) terminology, we will describe (2.2) as 
a "hard bound" on 8 s y. 

A random error 8 s y is a realization of a vector random 
variable taking values in Y. This random variable is com- 
pletely described by its probability distribution, a probability 
measure % on Y. If V is any Borel subset of Y, •q £ (V r ) is 
the probability of the event 8* ye V. Following Jackson 
(1979) we call 1 a "soft bound" on 8*y. We will assume 
that tj £ is known except for a few parameters determined by 
fitting the data. Backus (1989) discusses some aspects of 
estimating those parameters. We will ignore that question 


here. We will assume that y‘ and y‘y j are integrable with 
respect to q £ , and as a mnemonic device we will write 
E(i\ r ,y‘) and E(t\ r ,y‘y') in the forms E(8 R y‘) and 
E(8 R y‘8 R y'). We will assume that 

E(8*y‘) = 0 (2.3) 

because a non-zero E(8 R y‘) belongs in y £ if known and in 
8 s y‘ if unknown. Because of (2.3), the DxD variance 
matrix V of 8 R y has the entries 

V iJ =E(8 R y i 8 R yO. (2.4) 

Given y 0 , the confinement set V s for 8 S y and the proba- 
bility distribution % for 8 R y, we might try to predict the 
value of z £ = G(x £ ). Such an attempt is doomed unless 
G = HoF for some function H: Y— »Z (Backus and Gilbert, 
1967). Usually there is no such H and doom is avoided by 
including in the inversion some prior information about x £ , 
information available to us independently of y 0 , V s and % . 
This information usually takes the form of a hard bound or a 
soft bound on x £ in X. Some workers also admit "firm” 
bounds, hard bounds that are imprecisely known. We will 
not do so, because our prior information is often imprecise. 
Any inversion must include verifying that the results are 
insensitive to geophysically acceptable changes in the prior 
information. 

In the problem of geomagnetic inference at the CMB the 
model space X consists of all magnetic fields whose sources 
lie inside the core of the earth. The data vector y consists 
of D elements of the geomagnetic field B measured at satel- 
lite altitudes or on the earth’s surface. The prediction vector 
z £ might consist, for example, of P coefficients in the 
spherical harmonic expansion of the magnetic scalar poten- 
tial. It could also include some fluxes through null flux 
curves on the core-mantle boundary (CMB). One example 
of a hard prior bound on the correct model x £ is that its 
magnetic energy must be less than minus the gravitational 
self energy of the earth (see appendix B). Another hard 
bound is that the ohmic heat production in the core must be 
less than the heat loss out of the earth’s surface (Parker, 
1972; Gubbins, 1983; Backus, 1988b). This latter bound 
might fail if the earth’s thermal regime were significantly 
non-steady or if some of the ohmic core heat helped to drive 
the dynamo instead of diffusing out of the core (Backus, 
1975). 

In the geomagnetic problem, an example of a soft bound 
might be a probability distribution for the radial magnetic 
field on the CMB. Objectivists would want such a probabil- 
ity measure |i £ on X to have an objective basis, perhaps a 
well-accepted dynamo theory or perhaps simply an empirical 
fit to the data. For a subjectivist, p £ could describe any 
personal beliefs. A probability distribution, however, usu- 
ally cannot describe belief in a hard bound. Backus (1988a) 
showed that no hard bound on the norm of x £ can be 
approximated by a probability distribution. Appendix A 
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gives the details for the simple case of gaussian distribu- 
tions. If the subjectivist starts out from a prior hard bound 
widely accepted by the community, and claims to believe 
the extra prior information implied by some probabilistic 
imitation of it, his inversion of the data will be no more 
convincing to colleagues than is the extra prior information. 

Henceforth we will use ( X,F,G ) as the label for the 
inverse problem we have just described. This label is con- 
venient but incomplete. It makes no mention of the error 
bounds or the prior information, both of which are essential 
parts of the problem. For example, the prior information 
about x E often determines what properties of x E can be 
estimated from the data (Backus, 1989). We will call the 
inverse problem (X,F,G) linear if X is a real vector space 
and F and G are linear functions. Whether (X,F,G) is 
linear or not, our goal is to find an estimate z 0 for z E and to 
describe the error in that estimate. 

3. TRUNCATED APPROXIMATIONS 


Qta 0 Qta - Qta ■ 


(3.2c) 


The present section describes a formal solution to the 
inverse problem (X ,F ,G) set forth in the preceding section. 

Some of the steps in this formalism involve delicate ques- 
tions of functional and numerical analysis. The later parts 
of the present paper discuss those questions for linear prob- 
lems, but for non-linear problems we offer only a descrip- 
tion of what to compute, not how to compute it. Our for- 
malism focuses on truncation and approximation, since we 
believe these are the crucial issues in modelling. If treated 
properly, they produce an inversion scheme which ought tcBj^y - Fta (F ta( x e))~F ( x e)> 
satisfy both Bayesians and frequentists, and which can sup- 
ply error estimates based on either hard or soft prior bounds. 

The formalism depends on finding what we will call a 
truncated approximation (TA) to the inverse problem 
(X,F,G). A TA is an ordered quintuple 
TA - { X TA , P TA , F ta ,Q ta , G ta ). The first entry of TA, 

X TA , is a subset of a finite-dimensional topological space. It 
is often, but need not be, a finite-dimensional subspace of 
the model space X. The last four entries of TA are func- 
tions, whose domains and codomains are as follows: 


We call the TA (X TA ,P TA ,F TA ,Q TA ,G TA ) linear if 
(X,F,G) is a linear problem, X TA is a linear space, and 
P T a ,F ta , Q ta , G ta are linear functions. 

Although the prior information about x E is not explicitly 
listed in labelling the inverse problem as (X,F,G), that 
information usually plays a crucial role in constructing use- 
ful TA’s. For example, if X is the space of magnetic fields 
B with sources only in the core and the data are intensities 
iBl, then any field B that fits the data will have a twin, -B, 
that also fits the data. Unless we have prior information 
about the direction of B (e.g. that the dipole moment is 
positive) we cannot construct an invertible F ta or a TA. 
(This particular inverse problem has another, more serious, 
non-uniqueness whose remedy is at present unknown; 
Backus, 1970b.) 

An inverse problem has many TA’s. How to choose the 
best or even merely a good one can be discussed only after 
we have seen in the present section how to use them. We 
will treat optimization later and only for linear TA’s. To 
use a TA we begin by defining its errors of truncated 
approximation. 


(3.3a) 


(3.3b) 


^TA y ~ ItA ( *£ )> &TA Z ~ gTA( x E) 

where 

f’YA = T'lA O P TA , gTA ~ G T A O F - G 

These definitions are equivalent to the statements 

6ta z = G ta (P ta(*e))-G(x e ). 

If our prior information is a hard bound on x E , say 

x E 6 U E , (3.4a) 

then from (3.3a) we have hard bounds for 67-4 y and 87* z, 
namely 

&TAy e Vta — /taWe)* 5 rA z€ W TA = g TA ({/©4b) 


P ta '-X—>X TA , (3.1a) 

Fta-Xta-^Y, (3.1b) 

Qta'-Y^>Y ta where Y TA = F TA (X TA ), (3.1c) 

G ta :Xta->Z. (3. Id) 

We put no restrictions on these functions except to demand 
that Q ta be continuous, that 

Qta I Y ta = / r TA » (3.2a) 

and that F TA have a continuous inverse, 


F-Fa'.Yta ->X : 


TA 


(3.2b) 


If our prior information is a soft bound on x E , a probability 
measure (p £ ,Z x ) for x E in X, then |i. £ : X x -» R and we 
have soft bounds on S^y and 8 TA z. These are probability 
measures (% A ,Xr) and (Ct/i.^z) for 87-4 y in Y and 874 z in 
Z . They are defined as 

= tig 0 /ta > Cra = tig 0 gm - (3.5) 

Since 87-4 y and 874 z are both functions of the random vari- 
able x E , they are themselves random variables but will not 
usually be independent 

If a TA, ( X74 , P TA , F ta > Qta , Gta . ), is available, it pro- 
vides a complete formal solution of the inverse problem 
(X,F,G). Since y E = F(x £ ), (2.1) and (3.3) imply 

F T a °P ta( x e) = yo - 57?y- 8^+87^ . 

By (3.2a), Q TA oF TA = F TA , so 


Note that (3.2a) implies 
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F T a oP ta( x e) - GrA(yo - 8*y - 5sy + 5j- A y). 

But F TA : Y ta -*X Ta exists, so the foregoing equation 
implies 

Pta(*e) = f ta° CrACyo-S/jy-Sjy+^y). 
Therefore 

Gj A o P ta ( x e ) = G ta o F-f A o Q ta (yo _ 8^y _ 8 s y + & TA y) . 
Now z £ = G(x £ ), so (3.3) and the foregoing equation imply 


that 




= H ta (y 0 -8 £ y-8 5 y + 8 rA y)-8 7 - A z 

(3.6a) 

where 

H TA = G ta 0 Ff A 0 Q ta . 

(3.6b) 

We can rewrite these results as 



Z E " Zq—8z 

(3.7a) 

if we define 

Zq = Hta(Yo) 

(3.7b) 


and define 


8 z = H TA (y 0 )-H TA (y 0 -d R y-b s y+6 rA y)+5 rA zOJc) 

Equations (3.6) or (3.7) constitute a formal solution to the 
inverse problem ( X,F,G ). Under certain conditions that 
formal solution offers an opportunity to use the data to test 
the hypothesis that % is the correct probability measure for 
b R y. Those conditions are that Q TA be linear and not sur- 
jective and that 5™ y and 8 s y be negligible in comparison 
with 8 £ y. To test the hypothesis, note that since 
Ye +&TAy = f ta° P TA (\ E )= Q TA o F TA o P TA (x E ), there- 
fore from (3.2c) 

(h ~Qta)(Ye +8rAy) = 0- (3.8a) 

Invoking (2.1) and the assumed linearity of Q TA , we con- 
clude that 

Uy ~CrA)(yo) = ih~ Q ta ) ( 8 « y + 8 ^ y - b TA y) . (3.8b) 

If 8 s y and 8 TA y can be neglected in (3.8b), then the right 
hand side of that equation reduces to (I Y -Qta)(^eY)- This 
latter vector is drawn at random from the subspace 
(Iy~Qta)(Y) of y, and its probability measure on that 
subspace should be 

t\rq = % o ( Iy~Qta ) 1 (3-9) 

if really describes b R y. Given r\ R , we can calculate r\ RQ 
from (3.9) and then use various tests (Kendall and Stuart II, 
1979, Ch.30; see also Backus, 1989) to compute the proba- 
bility that (ly- Qta) (yo) could have been drawn at random 
from the distribution (3.9). 

It remains to understand and control the size of the error 
8 z in (3.7c). For non-linear problems, such estimates can 
often be based on the modulus of continuity of H TA and on 


how closely F TA o P TA and G TA o P TA resemble Q TA o F 
and G respectively. For linearizable problems we will later 
discuss these estimates in some detail. 

The rest of this section examines how objectivists and 
subjectivists would react to (3.6) and (3.7). Objectivists 
have only two cases to consider: is the prior bound on x £ 
hard or soft? First, suppose it hard, as in (3.4a), so that 
8 ™ y and 87 * z are systematic errors with confinement sets 
(3.4b). Objectivists would convert the soft bound on S £ y to 
a hard bound on z £ as follows: choose a small positive 
number e r and a Borel set V R of Y as small as possible in 
some sense (perhaps in diameter if Y has a norm) but large 
enough to assure that q £ (V R ) > 1 - e r . If 8 £ y e V R then 
(3.6a) implies that 

z E e H T A{yo~V R -V s + V T a)-Wj A . (3.10) 

If (3.10) is false, then an event (b R y£V R ) has occurred 
whose probability is less than e R . That is, (3.10) is true at 
confidence level at least l-e £ . If the data arise from 
aspects of x £ that are relevant to the predictions, then an 
adroit choice of a truncated approximation TA will make the 
confinement set (3.10) for z £ small enough to be interesting. 
In passing, we note that the confinement set (3.10) can be 
replaced by the smaller set consisting of all those z e Z that 
can be written H TA (y 0 -5 R y-5 s y+f TA (x))-g TA (x) with 
8 £ ye V R ,8 s ye V s ,andxG U E . 

For an objectivist the second case is that the prior bound 
on x £ is soft, a probability measure (jt £ ,Z x ) locating x £ in 
X . Now the objectivist chooses two small positive numbers, 
E R and e £ and two subsets V R czY and U E cX , as small as 
possible in some sense but large enough that 
q £ (V £ ) > l-e £ and n £ (t/ £ )>l-e £ . The objectivist would 
use this U E in (3.4b) to define V TA and W TA . If 8 s ye V R 
and x £ e U E , than (3.10) holds. Otherwise, either & R y£V R 
or x £ £U e or both. The probability of this union of two 
events is at most the sum of their separate probabilities, 
e r +e e . Thus (3.10) is true at a confidence level of at least 
1-e r -e £ . Again, as in the preceding paragraph, the 
confinement set (3.10) can be slightly improved. 

For objectivists it remains to say what "small" means 
when applied to V R or U E . If dim Z = 1, the answer is 
simple: for a given e r (or e r and e £ ) choose V R (or V* 
and U E ) so as to minimize the diameter of the set (3.10). 
Usually that set will be an interval, and then its length 
should be minimized. If dim Z > 1, the situation is less sim- 
ple, and Stark (1992) and Donaho (1992) have discussed 
various optimizations in some detail. 

Subjectivists will approach (3.6) and (3.7) differently, in 
order to exploit their view that they can describe their 
beliefs about z £ by means of a personal probability measure 
(Cs.E z ) on Z. A naive subjectivist approach to (3.6) is to 
convert all the hard bounds to probability distributions and 
then to find ^ from standard probability calculus. This 
naive method fails because 8 s y and x £ belong to spaces of 
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high dimension, so softening their hard bounds adds spuri- 
ous "information" (Backus, 1988b; see also appendix A). 
The naive approach does succeed when the systematic errors 
are negligible in comparison with the random errors. We 
give two examples. 

First, suppose 8 ^y, 87 * y, and 87 * z are all systematic 
errors, but are negligible in (3.6a). Then that equation can 
be written approximately as 

z/; = U TA (y E ) where y £ =y 0 — 8 /jy. (3.11) 

Subjectivists, unlike objectivists, would be willing to regard 
y £ as a random variable. In principle it can take only its 
true value, but a subjectivist would describe his personal 
opinion about it by means of a personal probability measure 
(q £ ,I r ) on Y . Most subjectivists would obtain this Tig 
from the error distribution r\ R by requiring for each Borel 
subset V of Y that 

r\E(V) = r\ R (y 0 -V). (3.12a) 

The argument for (3.12a) is that because of (3.11) the two 
events y £ e V and 8 £ ye y 0 -V are the same, and so must 
have the same probability. Applying this argument to z £ in 
(3.11) would lead subjectivists to adopt 

Ce = tic 0 Hta (3.12b) 

as the probabilistic description of their belief about where 
z £ lies in Z . 

In the second subjectivist example without hard bounds, 
suppose that 8 s y is negligible in (3.6a) and that the prior 
information is a soft bound, a probability measure ( |i £ , ) 

on X. Using (3.3), we write (3.6a) as 

*E = H TA (y 0 -5 R y+f TA (x E ))-g TA (\ E ). (3.13) 

The right side of (3.13) is a function of y 0 , 8 £ y and x £ , so 
we write it h TA ( y 0 ,5 £ y,x £ ). Then h TA (y 0 ,- ,):YxX -»Z. 
A subjectivist would likely regard 8 £ y and x £ as indepen- 
dent random variables, so their joint probability measure on 
YxX would be the product measure Tt £ |! £ of % and |i. £ 
(Halmos, 1950). Then a subjectivist would choose 

Ce = ( t 1«M-e)o h TA (y 0 ,- ,•) *. (3.14) 

If neither random nor systematic errors are negligible and 
the inverse problem is nonlinear, we have no advice for sub- 
jectivists except to linearize (vide infra) or to become tem- 
porary objectivists. 

If H ta in (3.6) is linear or linearizable, error estimation 
from (3.7) is much simplified for both objectivists and sub- 
jectivists. If 8 £ y, 8 s y and 8 j A y are small enough and DH TA 
is the gradient function of H TA at y 0 , then to first order in 
the errors we can write (3.7c) as 

8 z = A rA z+ 8 £ z + 8 s z where (3.15a) 

8 £ z = DH TA (d R y) (3.15b) 

8 s z = DH TA (b s y) (3.15c) 


&ta z ~ (Sta - DH ta 0 / 7 * X x £ ) (3.15d) 

Here f TA and g TA are given by (3.3b), and if H TA is linear 
then DH ta =H ta . Clearly 8 s z is a systematic error with 
confinement set DH TA (V S ), V s being as in (2.2), and 8 £ z is 
a random error with probability measure 

C*=ti £ o (DH ta )~\ (3.16) 

r[ £ being the probability measure for 8 £ y. Finally, if the 
prior information about x £ is the hard bound (3.4a), then 
Aja z is a systematic error with confinement set 

(g TA ~ DH ta of TA )(U E ). If the prior information about x £ 
is a probability measure (it £ ,5^ ), then A rA z is a random 
error with probability measure 

Cta ( 8ta ~DH ta o/ta) 1 • 

When the linearization (3.15) is possible, if we estimate 
z £ as H ta ( y 0 ), we commit an error 8 z that is the sum of 
one random and one systematic error. If the prior bound on 
x £ is hard, the random part of 8 z is 8 £ z and the systematic 
part is A ta z+6 s z. The confinement set of the latter is 
( Sta ~ DH ta o f TA )( U E ) +DH TA ( V s ). If the prior bound on 
x £ is soft, the systematic part of 8 z is 8 s z and the random 
part is A rA z+ 8 £ z. It seems reasonable to suppose that 
A 7a z and 8 S z are independent, so the probability measure of 
their sum is the convolution of their probability measures 
(Kendall and Stuart, 1977, p200ff). When H TA is lineariz- 
able, the simple form of 8 z as the sum of a random and a 
systematic error somewhat simplifies the objectivisms calcu- 
lation of a confidence set Linearizability has a more pro- 
found effect for the subjectivist. If dim Z >3, appendix A 
suggests that subjectivists seeking a wide audience will want 
to report the error in 8 z as the sum of two errors, a random 
error with known probability measure and a systematic error 
with known confinement set. Imitating the systematic error 
with a subjective probability measure when dim Z > 3 will 
introduce spurious information. Only if dim Z <2 will sub- 
jectivists want to soften the confinement set for the sys- 
tematic error in z to a probability distribution and obtain a 
probability measure for 8 z as the convolution of two random 
parts. Only when H TA is linearizable and subjectivists have 
softened the systematic error in z can they speak sensibly of 
the correlation between the errors in two predictions. 
Objectivists would never do so except when all systematic 
errors were negligible. 

4. NATURAL DOT PRODUCTS ON X , X TA AND Y 

In every inverse problem the probability measure % for 
the random error in the data vector generates a dot product 
on the data space Y. In certain inverse problems the prior 
information about the correct earth model x £ generates a dot 
product on the model space X or the truncated space %ta ■ 
When available, these dot products are powerful tools for 
estimating the prediction error (3.15). Therefore, we briefly 
examine all three dot products, following Backus (1989), 
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who discussed two of them at length and derived the 
machinery for obtaining the third. 

We begin with the dot product on X. This is available 
when X is a real vector space and the prior information is a 
quadratic bound for the correct earth model \ E : 

Q(x E ,x E )<q. (4.1) 

Here q is a known positive real number and Q is a known 
symmetric, positive-definite bilinear form on X. That is, for 
any Xj,x 2 eX, 0(x 1 ,x 2 ) is a real number that depends 
linearly on each of x x , x 2 when the other is fixed; also 
Q(x x ,x 2 )= <2(*2» x i) and. if x * 0, then Q(x,x)>0. 
Given a quadratic bound (4.1), we define the dot product of 
any x t ,x 2 eX to be 

xi-x 2 = q~ l Q(x i,x 2 ). (4.2) 

We write the norm for this dot product as || x || , so 

|| x|| =(x-x) 1/2 . (4.3) 

In terms of this norm, the prior quadratic bound on x E is 
simply 

II x £ || < 1. (4.4) 

If X is not complete in the norm (4.3), we complete it, so 
that it becomes a real Hilbert space with dot product (4.2) 
(Halmos, 1951). If dim X were finite, this step would be 
unnecessary; finite dimensional spaces are always complete, 
i.e. Hilbert spaces, and all their subspaces are closed. Every 
closed subspace U of X has an orthogonal projector 
n„:X-*t/ . 

Next we consider the dot product generated on X TA . This 
dot product is available when TA is a truncated approxima- 
tion to (X,F,G), when X TA is a real vector space, and 
when the prior information about x E is a probability meas- 
ure p.£ on X . On X TA we define a probability measure \i TA 
as 

Mta (4.5) 

Backus (1989) shows that, since X TA is finite-dimensional, 
there is a unique dot product on it relative to which the vari- 
ance tensor of \i TA is the identity tensor. To define this dot 
product without invoking tensors, we note that it is uniquely 
determined by the fact that for any fixed W! and w 2 in Xta 

E [(w,-w)(w 2 -w)] -E[w x -w]E [w 2 w] = v/ x -w 2 , (4.6) 

where w is a vector random variable in %ta with probability 
measure ]i TA (see Backus, 1989). 

Finite dimensionality is crucial to the foregoing construc- 
tion of a dot product on X TA . Backus (1988b) showed that 
this construction must fail in an infinite dimensional space 
because then the part of the space consisting of vectors with 
finite norm has measure zero. 

Finally we turn to the dot product on Y , which we con- 
struct from the DxD error variance matrix V defined by 


(2.4). We adopt the Einstein summation convention: in a 
term or product of terms if an index appears once as a sub- 
script and once as a superscript, then a sum over all possible 
values of the index is understood. We will assume that no 
linear combination of y 1 , • • • ,y D can be measured with 
perfect accuracy. This means that if a x , ■ ■ ■ ,a D are any 
real numbers not all zero, then 0<£[(a,8 s y‘) 2 ] = 
E [ (a, 5* y‘ ) ( a, h R y J )] = a, aj E [ 8* y‘ 8* y > ] = a, a, V ij . 
Therefore the matrix V of (2.4) is positive definite. It is 
obviously symmetric, so it has a positive-definite symmetric 
inverse, V~ l . 

If % , the probability measure for h R y, were gaussian, its 
density function would be a constant times 

exp(-y‘ Vij'yj/ 2). This suggests introducing the following 
dot product on Y even when % is not gaussian: if 
u = (m 1 , • • • ,u D ) and v = (v 1 , • • ■ ,v°) then 

u ■ v = «* Vy 1 v j . (4.7) 

The fact that V~ x is real, symmetric and positive definite 
assures that (4.7) does define a dot product. We define the 
norm ||y|| for this dot product as in (4.3). Since 
dim Y <°o, Y is complete in this norm, and every subspace 
U of Y has an orthogonal projector n y : y~> U . 

It seems intuitively obvious that increasing the random 
error 8* y decreases the norm of the data vector y, since the 
latter is measured in units of the former. Later we will need 
this fact, so here we give a proof. Suppose that 
8*y = 8/fyi + 8*y 2 where 5 £ y, and b R y 2 are independent 
random variables with mean 0 and variance matrices V x and 
V 2 . If V is the variance matrix of b R y, then, of course, V = 
V x + V 2 . If || y || and || y || , are the norms on Y produced by 
V and Vj, we claim that 

II y II < II y II i for all nonzero y € Y. (4.8) 

We need to show that if yeY and y #0 then 
y V 1 y T <yV x x y T , or 

y ( V f 1 — V~ x ) y T > 0. (4.9) 

Here y T is the column vector obtained by transposing the 
row vector y. Every positive definite symmetric (pds) 
matrix has a pds inverse and a unique pds square root, so 
we can define the DxD pds matrix A = Vf 1/2 V V f 1/2 . For 
any nonzero vector yef, we set u = yV x xa A~ xa Vf 1/2 . 
Then a simple calculation shows that 
y(Ef 1 -V~ l )y T = u(K-F 1 )u r . BatV-V l = V 2 . Since 
V 2 is positive definite and u * 0, we have (4.9). 

Perhaps the most important property of the dot product 
(4.7) is the analogue of (4.6): if u,ve Y then 

E [(u-8*y)(v8*y)] = u-v. (4.10) 

In the language of Cartesian tensors (Jeffreys, 1969) (4.10) 
says that under the dot product (4.7) the expected value of 
the dyad 5 R yb R y is the identity tensor on Y. To prove 
(4.10) without reference to tensors, note that 
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(u- 8 *y)(vS*y) = (u‘ V ij x d R y J )( 8 *y* V u x v' ), so 

£[(u- Vy)(v 5 *y)] = u'Vtj 1 E(& R y j 8 R y k )Vu l v l = 

u‘ Vj-'V* V u x v‘ = u‘ V a V = u ■ v. 

5. DATA COMPRESSION 

When the data are very numerous, it is common practice 
to combine them in various ways so as to generate a data 
vector y of lower dimension with smaller variance. The new 
data will be y 1 = /^(y) , • • ■ , y‘ M = K M (y) where K* : 7-» 
R for 1 < j < M . In this paper we will consider only linear 
functionals K‘ . It is pointless to include among them one 
that is ajinear combination of the others, so we will assume 
that [K x , - •• ,K m ] is linearly independent. Therefore 
M <D . We will call the process of computing 
y = (y 1 , • • • ,y M ) from y = (y 1 , ■ • ■ ,y D ) a "compression" 
of the data. Unless M = D the compression is irreversible; 
y cannot be recovered from y. We let 7 be the vector 
space of real M -tuples, and we define the compression func- 
tion K : Y — >7 by 

K(y) = (K'(y), - ■ ■ ,K M (y)). (5.1) 

From die original inverse problem (X ,F ,G) the compres- 
sion K produces a new, compressed inverse problem 
( X,F,G ) whose data space is Y. In this new problem the 
original data function F, the original probability measure 
H* on Y for the random error 8 „y, and the original 
confinement set V 5 for the systematic error 8 s y are replace 
by the new, "compressed" values 

F = Ko F , V s = K(V S ). (5.2a) 

If (X TA , P TA , F ta ,Q ta , G ta ) is a truncated approximation of 
(X,F ,G) and 

F T a =KqF ta , Q ta —KoQ ta (5.2b) 

then (X JA ta<F ta • Qta » G ta ) is a truncated approximation 
of (X ,F,G). 

The random errors in the compressed data are 
Sjiy-' = K> (5«y), and we can imitate (2.4) by defining 

V* = E (S«y‘ h R y > ), (5.3) 

where now V is an MxM positive definite symmetric 
matrix. Then, imitating (4.5), we can define a dot product 
on Y. If u = (u 1 , • • • ,u M ) and v = (v 1 , • • • , v** ) are any 
vectors in Y, then their dot product is 

u ~ v = u‘ Vj v'. (5.4) 

To use K we must understand how the dot product (5.4) 
on Y is related to the dot product (4.5) on Y. The answer 
hinges on dual sequences. Since 7 is a finite-dimensional 
dot product space and K ‘ : Y—> R is a linear functional on Y, 
there is a unique K-' e Y such that 

K'(y) = K' -y for all ye Y 


(Halmos, 1958). Define 

U = span { K 1 , • • ■ , K M ) , (5.6a) 

Uj = span [ { K 1 , • • • ,K M )\{K> ) ]. (5.6b) 

Since { K 1 , • - ■ ,K M ) is linearly independent, dim U = M 
and dim Uj = M- 1. Therefore dim (Uj L nU)= l, so 
Uj X r\U contains exactly one vector whose dot product with 
K ; is 1. Call this vector K y . The ordered sequence 
(Ki, • • • ,K m ) is the "dual sequence" of (K 1 , • • • ,K M ). 
It satisfies and is uniquely determined by two conditions: 

K; e U for 1 < j < M (5.7a) 

K,K> =5,>. (5.7b) 

(Hereafter 8 , y , 8 ‘y , 8 ‘^ and 8 ,y all denote the Kronecker 
delta, 1 if i = j and 0 if i * j.) The fact that (5.7) 
uniquely determine the dual sequence shows that the dual 
sequence of (Kj , • • ■ ,K M ) is (K 1 , • • • ,K M ). If 
(K'.-'-.K^) is orthonormal, then K‘-K^= 8 ' ; , so 
uniqueness in (5.7) shows that an orthonormal sequence is 
self-dual. 

Both { K 1 , • • • ,K M } and {K lt • • ■ ,K U } are bases for 
U . Therefore, from (5.7) it follows that if u e U then 

u = (u K,)K' = (u • K')K; . (5.8a) 

If ve Y, then dotting v into (5.8a) gives 

u v = (u K,)(K' • v) = (u • K')(K; • v). (5.8b) 

By symmetry, (5.8b) is also true if u e Y and v e U . If we 
set u = K, and v = K* in (5.8b), then (5.7b) implies 

(K,- • Ky)(K^ • K*) = 8 ,*. (5.9a) 

If we set u = K‘ and v = K t in (5.8b), then (5.7b) implies 

(K‘K')(K,-K*) = 8 V (5.9b) 

Therefore the two MxM matrices whose entries are K, Kj 
and K‘ ■ K y are inverse to one another. 

It follows from (5.8a) that fl y : 7— > t/, the orthogonal 
projector of 7 onto U , can be calculated in several ways: 

n y (y) = (y • K; ) K' = (y • K' ) K; = K‘(y) K } . (5.10) 

Now we can describe the effect of data compression on 
the dot product This will involve showing that 

K(U 1 ) = 0 (5.11) 

and that K_ \ U is an isometry between U whh dot product 
(4.5) and 7 with dot product (5.4). That is, K I U : U -» 7 is 
a bijection for which, if u , v e U , then 

K( u)~X(v) = u-v. (5.12) 

To prove (5.11), we observe from (5.10) that for 1 < j < M 
we have X‘ (P/ l/ (y)) J = K i n t/ (y)= X'(y)(K‘ K; ) = 
K j (y) 8 = K‘ (y), so K(PJ(j(y)) = K(y). Since this is true 
for all ye 7, 


(5.5) 
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Ko rty = K, (5.13) 

from which (5.11) follows immediately. 

JTo prove (5.12), we note that K(a) = 
(^‘(u), • ■ • ,K m ( u))= (K 1 - u, ••• ,K m u) and similarly 
for AT(v). Then by (5.4) 

K (u) r K (v) = (K‘ • u) Vf (K'v). (5.14) 

From (5.3) and (5.5) V' 1 = E [ (K* • 8* y ) (K'' • h R y ) ]. 

Therefore, by (4.8) 

V l > = K‘ K'. (5.15a) 


It follows from (5.9) that 

V? = K, K j. 

Thus 


(5.15b) 


For any such pair, the function value E(a,P) is a finite 
dimensional subspace of X with these properties: 

l|F(x)|| > a || x || if xeE(a,P), (6.1b) 

||F(x)|| <P||x|| if xeE(a,p) 1 . (6.1c) 

We show first how to construct a truncator and then how to 
obtain truncated approximations from it. 

We begin by noting that a truncator always exists for a 
linear inverse problem ( X ,F ,G ) in which X is a Hilbert 
space and Y is a finite dimensional dot product space. 
Since F :X— >Y is continuous, it is bounded (Halmos, 1951). 
That is, there is a number M such that ||F(x)|| < M ||x|| for 
all xg X . The smallest such M is written ||F || and called 
the bound of F . By definition, ||F || is the smallest real 
number such that 


^(u)-^(v) = (uK')(K l K / )(K>v) = 

= t (u • K‘) K,] ■ [ Kj (K' • v) ]. 

Since u , v g U , (5.8b) shows that this last term is u • v. 
Now we have proved (5.12). But if K( u) = 0 and ug U_ 
then (5.12) implies that u = 0. Therefore, K\U :U ->Y 
being linear, it is an_ injection. Then it is a surjection 
because dim U = dim Y . 

One way to interpret (5.11) and (5.12) is as follows: 
when we compress a data vector y = (yj . • • • ,y° ) to a 
vector y = (y 1 , • • • ,y M ) where y‘ = K‘(y) = K‘ y, we 
lose all the information in Ily^fy). The compressed data 
vector y can be thought of as a vector in U, namely y‘ K, . 
In this way, every compressed data vector "is" a vector in U 
and vice versa. 

Because of (5.12) and (5.13), we note finally that 

II ^(y) II <11 y II if ye Y. (5.16) 


6. TRUNCATORS FOR LINEAR INVERSE PROBLEMS 
WITH DOT PRODUCTS 

In this section we suppose that the inverse problem is 
linear, that the the model space X is a Hilbert space, and 
that the data space Y has been provided with a dot product 
as in section 4. We also suppose that the data function F 
and the prediction function G are continuous. A discontinu- 
ous linear function on X is unbounded (Halmos, 1951), so 
arbitrarily small changes in its argument can produce arbi- 
trarily large changes in its value. If F or G is discontinu- 
ous, then a stronger prior quadratic bound for x E is needed 
(Backus, 1989) for a sensible inversion. 

In the class of problems just described, there is a tool 
ready to hand for constructing a large class of truncated 
approximations. We call this tool a truncator. It is a func- 
tion E whose domain is the set of pairs (a,P) of real 
numbers such that 

(6.1a) 


l|F (x) || < ||F || ||x || for all xgX. (6.2) 

Since F is bounded and Y is finite dimensional, F has a 
singular value decomposition (Backus, 1989). That is, there 
are orthonormal bases {x,,x 2 , • • • } and {y 2 • • • , y D } for 
X and Y and non-negative real numbers X u ■ ■ ■ ,X D such 
that 

^(X B ) = Ky« if 1 < n < D, (6.3a) 

F(x„) = 0 if n > D. (6.3b) 

(Note that on the right in (6.3a) there is no implied sum on 
n because in both its appearances n is a subscript.) If 
a>X n for all n , we define S (a, p) to be the space contain- 
ing only 0. Otherwise, we define E (a, P) to be the subspace 
of X spanned by those x „ for which X n > a. To verify 
(6.1), note that if xg X then 

oo 

x= £ (x-x„)x„ so (6.4a) 

n= 1 

F(x) = 2 (x-x„)X n y „ and (6.4b) 

n=l 

II F (x) || 2 = £ X n 2 (x-xJ 2 . (6.4c) 

»=1 

If x g E(a,P), then x • x„ = 0 if either n > D or \ H < a, so 
|| F (x) || 2 > a 2 £ (x-x„) 2 = a 2 ||x|| 2 . (6.4d) 

A=1 

Hence E(a,P) satisfies (6.1b). For (6.1c), suppose xg 
E(a.P) 1 . Then x-x n = 0ifn < D and X n > a, so 

||F(x)|| 2 < a 2 £ (x-x„) 2 < a 2 ||x|| 2 . (6.4e) 

n=l 

Since a < p, E(a,P) satisfies (6.1c). 

The purpose of a truncated approximation of (X,F,G) is 
to make that inverse problem amenable to computation. 
The purpose of a truncator is to provide truncated approxi- 
mations. The truncator described in the preceding paragraph 
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\\F(x)\\ 


is an inauspicious beginning for this enterprise, since con- 
structing it requires computing the singular value decompo- 
sition of an <x>xD matrix. In practice, of course, such an 
infinite process requires some sort of analytic justification of 
a finite approximation. In many inverse problems, our 
knowledge of the data function F provides such analytic 
information in the form of a "pre-truncator", a tool for con- 
structing truncators. A pre-truncator is a function n whose 
domain is the set of all positive real numbers. For any real 
y > 0, n(y) is a finite dimensional subspace of X such that 

II F (x) II < Y || x || if xe IKy) 1 . (6.5) 

That pre-truncators exist is trivial: if 2 is a truncator and 
n(Y) = S(y/2,y). then n is a pre-truncator. But the idea 
is to go the other way. We hope to use our knowledge of F 
to find a pre-truncator analytically. We cannot say more 
about this process here, since it depends on the details of the 
F being studied, and success is not guaranteed. Once we 
have a pre-truncator, constructing a truncator from it is a 
finite computational task. 

To see this, suppose that analysis of F has given us a 
pre-truncator II. To construct a truncator from it, we note 
that for any y>0, dim TI( y) < °°, so there are efficient 
algorithms that provide a singular value decomposition of 
F I II(y) (Golub and Van Loan, 1983; Press et al, 1992). 
From such a computation we obtain orthonormal bases 
(*i, ■ • • ,xa, ( t) } and {ft , • • • ,y D ) for I1 (y) and Y and 
non-negative real numbers X, lt • • • such that 

F(x„) = K ft, if 1 < n < min [D ,N(y)], (6.6a) 

F(x„) = 0 if min[D ,N(y)] < n < N(y). (6.6b) 

The x„ and y„ in (6.6) will usually differ from those in 

(6.3) . Now we construct from n a truncator 2 as follows: 
given any a and P with 0 < a < p, choose y so that 

P = (a 2 +Y 2 ) 1/2 - (6.7) 

Compute the singular value decomposition (6.6) of the 
analytically constructed space n(y). If a > "K n for all n , let 
2(a,P)= (0). Otherwise, let 2(a,p) be the subspace of 
n(Y) spanned by those x n in (6.6) for which A.„ > a. 

To prove that the 5 so defined is a truncator, we appeal to 

(6.4) . Those equations apply to the present situation if in 
(6.4a) we replace °° by N and in (6.4b-e) we replace D by 
min [D,N(y)]. Therefore (6.1b) follows from (6.4d). It 
remains to prove (6.1c). To that end, suppose 
xeE(a,P) i . Then we can write x = x y +x^ L where 
XyG 2 ( o , |3 ) ■*■ o n(y) and x^-e n(Y) 1 . It follows from 
(6.4e) that 

||F(x y )||<a||x T || 

and from (6.5) that 

l|F(x Y L )|| < yIIx^II. 

But F(x) = F(Xy) + F(Xy) so, by the triangle inequality. 


<||F(x y )|| + ||F(x y 1 )||. Therefore 

||F(x) || < allx^H+Y llx^H if x = x t +x^ l . (6.8a) 

Since x^ • x^ = 0, we have 

l|x|| 2 = ||x y || 2 + Hx^l 2 . (6.8b) 

Under the constraint (6.8b), the right side of (6.8a) has max- 
imum value ||x || (a 2 -t- y 2 ) 1/2 . Then an appeal to (6.7) 
proves (6.1c). 

It remains to show how to obtain truncated approxima- 
tions from a truncator. Suppose that 2 is a truncator for the 
linear inverse problem (X,F ,G). For any real a,p satisfy- 
ing 0 < a < p, we define an ordered quintuple 
TA = (X TA , P TA Fta . Qta *Gta ) as follows: 

%ta = — ( a . P) . Fta- n* TA . F ta = F \X TA , Q ta = H r $Sh A = G 

where Y TA = F (X TA ) and and Ily M are the orthogonal 
projectors of X onto X TA and of Y onto Y TA . "\Ve claim that 
TA is a truncated approximation to (X ,F ,G). To show this 
we must prove (3.2). Equations (3.2b) are properties of any 
orthogonal projectors, and one consequence of (6.1b) is that 
Fta ({0)) = (0), which proves the existence of the inverse 
function (3.1a) (Halmos, 1958). That function is continuous 
because it is linear and its domain is finite dimensional. 

The functions P TA and Q TA are continuous because they are 
bounded; indeed, \\P TA || = \\Q TA || = 1. 

In the geomagnetic inverse problem, and perhaps in oth- 
ers, it happens that 8* y = 8* yi + 8* y 2 , where 8^y, and 
h R y 2 are independent random errors, and the variance matrix 
V i of 8* yi is easy to invert while the inverse of V , the full 
variance matrix of 8 s y, is obtainable only through heavy 
computation. In this case it is useful to construct a pre- 
truncator Ilj based on Then (4.6) assures us that Ilj is 
also a pre-truncator for V. Thus Ift provides a finite- 
dimensional space on which to carry out numerically the 
singular value decomposition (6.6) required to find a trunca- 
tor for the full variance matrix V. 

7. PREDICTION ERRORS IN LINEAR INVERSE 
PROBLEMS WITH DOT PRODUCTS 

In this section we examine the error 8z defined in (3.7) 
when the truncated approximation takes the form (6.9). 
Specifically, we need only the following assumptions about 
the truncated approximation (X TA yP ta . Fj A , Qta , Gta)'- 
First, the model space X is a Hilbert space and the prior 
information about the correct model x £ is the bound (4.4). 
Second, the probability measure % has been used as in sec- 
tion 4 to provide a dot product on the data space Y. Third, 

X TA is a finite dimensional subspace of X, and P TA is the 
orthogonal projector of X onto X TA while Q TA is the orthog- 
onal projector of Y onto F (X TA ), i.e. onto Yta • 
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We define Pta 1 and Qta 1 to be the orthogonal projectors 
of X onto X ta l and of Y onto Y TA X . Thus 

P TA 1 = P X i:X^X TA \ Qta 1 = P y i'Y -*Y T ff*L\) 

a TA 1 TA 

Then f T and gp in (3.3b) are 

f T =-F o Pt A X > 8t = -GoPta (7.2) 

Since H T in (3.6b) is linear, we can replace (3.7b) by (3.13) 
where now DH T = H T . Thus the 8 z in (3.7b) is 

8 z = 8 j-z + 8 S z + 8s z where (7.3a) 

8 R z = H T (& R y), 8 s z = H T (b s y), and (7.3b) 
8 T z = (H T oF -G)oP TA ± (x E ). ( 7 . 3 c ) 

Here 8 fi z is the random error in predicting z E produced 
by the random error & R y in the data; 8 s z is the systematic 
error in predicting^ z £ produced by the systematic error 8 s y 
in the data; and S T z is the total truncation error, the sys- 
tematic error in predicting z E produced by the truncated 
approximation . 

First we consider the random error 8 * z. From (3.14) its 
probability measure on Zp is 

C*=*l*o Hf\ (7.4) 

The entries of its variance matrix, £( 8 s z‘ 8 R z J ), can be cal- 
culated as follows. Let c‘:Z->Re be the linear functional 
on Z that assigns to each z = (z 1 , • • ■ ,z p )e Z its i’th 
entry, z‘, so c‘ (z) = z‘. Then c‘ o ll r : y~» Re is a linear 
functional on Y. Since Y is a finite dimensional dot product 
space, there is a unique vector H T ‘ e Y such that 
c‘ o H T (y) = H r ‘ ■ y for all y g Y (Halmos, 1958). In partic- 
ular, then, 

8 *z* = H r ‘- 8 *y. (7.5) 

Combining (7.5) with (4.8), we conclude that 

£( 8 «z‘ 8 «z>) = H r ‘H r L (7.6) 

We note that if ye Y TA l then H T {y) = 0, so H r ‘ y = 0. 
Therefore H/ e (Y TA 1 ) 1 = Y TA . Thus in (7.6) the vectors 
Hr*' and H T J are members of Yta, a subspace of Y whose 
dimension is dim Yta, i-e. dim X TA . Usually D > dim X TA , 
so this fact provides a computational saving in (7.6). 

About the systematic error 85 z we can say in general only 
that its confinement set is H T (V S ) where Vj is the known 
confinement set of the systematic error in the data, 8 s y. If 

V s is convex or a convex solid polyhedron or a solid ellip- 
soid centered on 0, so is Hp( v s)- Appendix C shows how 
to find Up (Vs) when V s is an ellipsoid. 

The confinement set for the systematic error 8 r z is also a 
solid ellipsoid centered on 0 , because the confinement set 
for x E is the solid unit ball £*(1) centered on 0 in X (see 
4.4). Again appendix C shows how to find that set. 

The result that we provide to both subjectivists and objec- 
tivists is that with the truncated approximation (6.9) z E can 


be estimated as z 0 = H T (y 0 ). The total error 8 z = z 0 - z E in 
this estimate will_be the sum of a random error 8 *z and a 
systematic error 8 T z+8 s z. The random error has probabil- 
ity measure (7.4) on Zp, and has variance matrix (7.6). The 
systematic error has as its confinement set 

8 T z+8 s ze (H t o F -G )o P TA - L (£ x (l))+// r (F s ).(7.7) 

If V s is an ellipsoid, so is H T (V S ), and then (7.7) is the sum 
of two ellipsoids. This need not be an ellipsoid; the sum of 
a ball and a needle is a sausage. 

At this point objectivists can easily calculate confidence 
sets for 8 z at all confidence levels, and will regard the 
inverse problem as solved. If dim Z > 3, subjectivists can 
either temporarily become objectivists and accept confidence 
sets as a solution to the inverse problem, or they can be 
content to describe Sz as the sum of a random error with 
known probability measure and a systematic error with 
known confinement set. Appendix A suggests that only if 
dim Z <2 will most subjectivists want to exploit their wil- 
lingness to use a personal probability measure to describe a 
confinement set. In that case, they gain accuracy by consid- 
ering 8 r z and 8 S z separately, rather than combining them as 
in (7.7). Subjectivists can then separately soften the two 
hard bounds 

8pze (G -H r oF )o P TA 1 (B X (\)) , 8 s zeH r (V s ), 

obtaining two personal probability measures, Cr and C s > fo r 
87 - z and 8 s z in Zp. Subjectivists could probably convince 
each other that 8 s z, 8 s z and 8 r z are independent random 
variables, so they can calculate their personal probability 
measure for 8 z in Zj by convolving t , R , Cj and t, T . More- 
over, they can then calculate the variance matrix^ of 8 z as 
the sum of the variance matrices of 8 R z , 8 s z and 8 t-z. 

Equations (6.9) make clear that a single truncator E gen- 
erates a truncated approximation for every pair (a,P) satis- 
fying (6.1a). Each of these truncated approximations pro- 
duces different hard and soft bounds for the systematic and 
random parts of 8 z. Thus, (a,P) can be varied in an attempt 
to optimize these bounds. If dim Z > 1, what is optimal 
will depend on the purpose to which the inversion is put If 
dim Z = 1, then an objectivist inversion will usually produce 
an interval in Re as the confidence set, and (a,P) can be 
chosen to minimize the length of this interval. A subjec- 
tivist inversion with dim Z - 1 will produce an interval as 
the hard bound and a probability measure as the soft bound. 
The subjectivist might want to choose (a,P) to minimize the 
sum of the interval half-length and the standard deviation of 
the probability measure. Thus objectivists and subjectivists 
will perform the same computations but will interpret them 
differently. 

8 . CONCLUSIONS 

In geophysical inversion the model space X is usually 
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infinite dimensional, and the vector space Y of possible data 
vectors is necessarily finite dimensional. Therefore it is 
essential to recognize that the observed data vector y 0 can- 
not provide a complete description of the correct model x £ ; 
y 0 can predict only a finite number of numerical properties 
of x £ , which we collect into a finite dimensional prediction 
vector z E . By focussing on the truncated approximation of 
the model space and the data function we have constructed a 
general scheme for solving such inverse problems. It turns 
out that the truncated approximation must include not only a 
truncated model space X TA but also two projectors, a model 
projector P TA that maps X onto X TA and a data projector 
Q ta that maps Y onto Y TA . Here Y TA =F(X TA ), and 
F : X —> Y is the data function for the forward problem. 

Our inversion scheme does not require a commitment to 
either the subjectivist or the objectivist interpretation of pro- 
bability; both camps can use our results. The scheme does 
require a careful distinction between random and systematic 
errors in the data and prediction vectors, and between proba- 
bilistic and deterministic prior information about x £ . Con- 
cerning the random error 5* y in y„ we know only that it can 
be regarded as drawn at random from a population in the 
data space Y whose probability measure % is known. (The 
question of estimating a few unknown parameters in 
from the data is discussed by Backus, 1989.) About the sys- 
tematic error 5 s y we know only that 5,jye V s , a known 
subset of Y called the confinement set for 8 5 y. There are 
also truncation errors Sr-y and 87Z produced in trying to 
compute the correct data vector y E and the correct predic- 
tion vector z E from Pj A {\ E ), the truncated form of x £ . 

The input to our inversion consists of %, V s , y 0 and 
prior information about x £ . The latter can be a confinement 
set U s cX for x £ or a probability measure (i £ describing 
where x £ was likely to be in X before the data vector y 0 
was measured. The output of the general inversion scheme 
is an estimate z 0 for z £ and an expression for the error, 
8z = z 0 -z £ , in the form 8z = Az+S^z, where Az depends 
on 8 £ y, 8 s y, 8 r y and y 0 . In this general form the inversion 
is of interest mainly to objectivists, because it can be used 
to find confidence sets for 8z but is not easily adapted to 
untangling the non-linear interaction of random and sys- 
tematic errors that is of interest to subjectivists. Subjectivists 
cannot "soften" the confinement sets to probability distribu- 
tions because almost certainly dim Y is so large that such 
softening introduces spurious extra information. 

If the inverse problem and the truncated approximation 
are linear, 8z = 8 £ z + 8 s z + 8 r z, where 8 £ z is the random 
error produced by 8* y, 8 S z is the systematic error produced 
by 8 s y and 8 T z is the total truncation error produced by 8 r y 
and 87 z. If the prior information about x £ is a confinement 
set, 87 z is a systematic error. If that prior information is a 
probability measure for x £ , then 87 z is a random error. 
These errors are given a form that makes it easy for objec- 
tivists to calculate confidence sets. If dim Z > 3, subjec- 


tivists who do not want to introduce spurious "data" through 
the inversion process may want to leave 8z as the sum of a 
random error with known probability measure and a sys- 
tematic error with known confinement set. If dim Z < 2 , 
many subjectivists will be willing to "soften" the 
confinement set to a personal probability distribution and 
combine the systematic and random parts of Sz into a single 
random error. We conclude that if the systematic errors are 
comparable to or larger than the random errors it usually 
makes no sense to try to estimate correlations between pred- 
ictions. The sole exception is the subjectivist making at 
most two simultaneous predictions. 

We can always introduce on the data space Y a dot pro- 
duct that makes the variance tensor of 8 £ y equal to the 
identity tensor on f. If the prior information about x £ is a 
quadratic bound, that bound generates a dot product under 
which X can be completed to a Hilbert space. When both 
these dot products are available and the inverse problem is 
linear, the truncated approximation required for the inver- 
sion can be described explicitly. The main computational 
burden is the singular value decomposition of an NxD 
matrix, where N is the dimension of the truncated model 
space X TA and D = dim Y . In some inverse problems, the 
value of D can be very considerably reduced by replacing 
y 0 with K ( y 0 ), where K.Y-+Y is a linear "compression 
function" chosen so that dim Y c D . 
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APPENDIX A: RISKS IN GAUSSIAN BOUND SOFTENING 

In spaces of high dimension, "softening" a quadratic 
bound on a vector x by imitating the bound with a personal 
probability measure generates very precise information about 
x that comes not from the bound but from the softening pro- 
cess (Backus, 1988b). For spaces of low dimension, the 
extent of this problem can be studied quantitatively in the 
special case that the personal probability measure is gaus- 
sian. In the present appendix, we will consider a vector x 
about which we know only that it belongs to a real dot pro- 
duct space X of dimension N< <*> and that 

11*11*1. (A.l) 

When N is not extremely large, what are the consequences 
of trying to imitate (A.l) with a gaussian personal probabil- 
ity measure on X ? How much new "information" does this 
bound softening generate? 

Unless the density p(x) of the personal probability meas- 
ure is a function of ||x|| alone, it will single out some 
directions in X , generating new "information" unjustified by 
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(A.l). Therefore, since p is gaussian, there must be a con- 
stant y such that 

p (x) = (y!2n) NI2 exp(-Y || x || 2 /2). (A.2) 

It follows that yII x || 2 has a chi-square distribution with N 
degrees of freedom. The Bayesian literature discusses at 
length but does not settle how to choose y (Jackson, 1978; 
Gubbins, 1983; Backus, 1988a). We will leave y 
unspecified since our argument is independent of its value. 
We will study the random variable 

•r = Yll x II 2 /2 . (A.3a) 

The probability density function for s is (Kendall and 
Stuart, 1977, p397) 

p(N,s) = T(N/2)- 1 s ni2 - 1 e~ s . (A.3b) 

The mean and variance of s are both N 12, and, for large N , 
p(N ,s) is asymptotically gaussian with that mean and vari- 
ance. Therefore for large N 

l-a(2/N) 1/2 < 2s IN < l + a(2 IN) 1 ' 2 (A.4) 

with probability <)> = erf(a/V2). Thus (A.4) is true at 
confidence level <|). But whenever (A.4) is true, the ratio of 
the largest to the smallest admissible s is 

r(N,0) = [l + a(2/N) 1/2 ]/[l-a(2/N) 1/2 ]. (A.5) 

Since s is a constant multiple of || x || 2 , at confidence level 
<]> we know || x || 2 to within the factor r(yv,<|>). For fixed (|> 
and large N, r (N , 0) is very close to 1. Therefore, soften- 
ing (A.1) to (A.2) enables us to extract, apparently from 
(A.l) and at confidence level <|>, the actual value of || x || 2 
with a very small per cent error. This paradox is the objec- 
tion to bound softening when dim X is large. Softening the 
upper bound (A.1) introduces a lower bound for || x|| that is 
generated entirely from the belief that the upper bound can 
be described probabilistically. 

The question remains, whether such a paradox persists 
when N is not large. To pursue this question, let us sup- 
pose that (4.3b) is the probability density for s . Let us fix 
N , 0 and s B , and let us choose s T (N ,§,s B ) so that with pro- 
bability 0 

s B < s < s T . (A.6) 

If (A.6) holds, then we know s to within the factor 

r (N,0,s b ) = s T (N ,$,s B )/s B . (A.7) 

If we minimize r (N , 0, s B ) with respect to s B , we obtain the 
boldest assertion permitted by (A.2) for the given N and 0: 
at confidence level 0 we know || x || 2 to within a factor 

r (N,0) = min s T (N ,ty,s B )/ s B , (A.8) 

the minimum being taken with respect to s B . Table 1 gives 
r (N , 0) for various dimensions N and confidence levels 0. 
That table means, for example, that if N = 30 then replacing 


(A.1) by (A.2) and choosing y provides us with the value of 
|| x || 2 to within a factor of 2.5, 3.0 or 5.4 according as we 
accept a confidence level of 0.9, 0.95 or 0.99. This conclu- 
sion is independent of the value we choose for y. 

TABLE 1. The factor r (N,0) to within which 
|| x || 2 is specified by gaussian softening of a 
hard quadratic bound, as a function of the 
dimension N of x and the acceptable confidence 
level 0 


N \ 0 

0.5 

0.9 

0.95 

0.99 

1 

9.8 

520 

2500 

8400 

2 

4.4 

47 

112 

760 

3 

3.2 

19.8 

38 

150 

4 

2.7 

12.4 

21 

64 

5 

2.4 

9.2 

14.5 

38 

10 

1.85 

4.6 

6.2 

11.4 

20 

1.54 

2.9 

3.5 

5.3 

30 

1.42 

2.4 

2.8 

3.9 

50 

1.31 

1.94 

2.2 

2.8 

100 

1.21 

1.59 

1.74 

2.1 

1000 

1.06 

1.16 

1.19 

1.26 

oo 

1 

1 

1 

1 


If (A.l) really is all we know about x, purists of the 
objective school of probability will use table 1 to reject 
(A.2) as a substitute for (A.1), whatever the value of N. 
Subjectivists who accept hypotheses verified at a confidence 
level of 0.95 might be willing to substite (A.2) for (A.1) 
when N = 1 or 2, especially if they could convince them- 
selves that they really were able to estimate || x || 2 to within 
a factor of 2500 or of 112. When N > 3, softening intro- 
duces a lower bound on || x || 2 high enough that whether to 
soften (A.l) to (A.2) begins to be a personal matter. Of 
course this is not a drawback for subjectivists. In any case, 
Table 1 will provide each subjectivist with a quantitative 
basis for deciding whether to soften the hard bound (A.1). 

There is one more argument against softening a hard 
bound when N > 3. The density function (A.3b) vanishes at 
s - 0 for N > 3 but not for N = 1 or 2. Thus, accepting 
(A.2) when N > 3 really does express a disbelief in values 
of || x || 2 considerably less than 1. 

APPENDIX B: NOTATION 

If U , V and W are sets, peV means that p is a member 
of V, and UczV means that U is a subset of V . If 
U\, ■ • • ,U N are sets then U , x • • • xU N , their Cartesian 
product, is the set of all ordered N-tuples (u { , ■ ■ ■ ,u N ) 
with u; e Ui , i = 1 , • • ■ ,N. The set V u W consists of 
all objects belonging to V or W or both. The set 
V n W consists of all objects common to V and W . The set 
W \ V consists of all objects in W and not in V . The set 
with no members is written 0 . The set whose only 
members are « lt • ■ • ,u N is written {«!,•■• ,%}. The 
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set whose members are all the subsets of X is written 2 X . If 
U and V are arbitrary subsets of a vector space Y, then 
U + V is the set of all vectors in Y that can be written as 
u+v for at least one ue U and one ve V. The set U -V 
is defined similarly from u-v. The sets [u}+V r and { u } — V 
are usually abbreviated as u + V and u - V . If U is a subset 
of vector space Y then span U is the set of all finite linear 
combinations of members of U ; span U is a subspace of 
Y. 

If X and Y are sets, a function / :X — > K is a rule that 
assigns to each xeX a unique value f(x)e Y ; mul- 
tivalued functions are disallowed. The sets X and Y are 
called the domain and codomain of / , and x is the "argu- 
ment" of /. The notation f(x) always stands for a value of 
the function / , never for the function itself. The function is 
sometimes written / ( • ). Thus, for example, if 
f :U xV -*Y and u 0 e U and v 0 e V, then f (u 0 ,):V-+Y 
and /(•,v 0 ):t/->T are defined as follows: 
f(u 0 ,-)(v)= f(u 0 ,v) for all v e V, and /(•,v 0 )(u) = 
/ ( u , v o) for all u e U . If U cX then / ( U ) is the set of all 
y e Y that can be written f (x) for at least one x e U . The 
set /(X) is the "image" of f:X->Y. If V czY then 
f~ l (V ) is the set of all x e X such that / (x)e V. The 
notation does not require or imply that the inverse function 
f~ x :Y->X exists. However, the function / -1 :2 r -»2 x 
always exists. When / -1 exists in both senses, the two 
meanings of f~ x ( V) agree. If for each ye/ (X), f~\{y }) 
contains only one member, / is an "injection". If 
/ (X) = Y, f is a "surjection." If / is both, it is a "bijec- 
tion," and then it has an inverse, f~ l : Y — >X. 

If /:X->X and U <zX then /It/, the "restriction" of/ 
to U, is that function g:U->Y such that g(x)=f(x) 
whenever x e U . If xtU , g(x) is not defined. If A ,B ,C 
are sets and p:A^>B and q:B-^C then the "composite" of 
p and q, written qop , is that function qop :A-*C such 
that for each xeA, ( qop)(x ) -q(p(.x)). Usually 
(qop)(x) is abbreviated qop(x). If D is another set and 
r:C-»Z> is another function, clearly ro(qop) = (roq)op. 
If U is any set then the identity function on U , I v :U -* U , 
is defined by requiring that l u (u) = u for each ueU. If 
f:U-^V then clearly / = I v of = /o/y . If / : t/ F has 
an inverse, / -I :F— >1/, then /o/ -1 = /y and / -1 o/ =/y. 
We denote the set of all real numbers by R. A real-valued 
function /:[/-> R is a "functional" on U. If Y is a real 
vector space, X is any set, and /:X->y, g:X-*Y, and 
a , b e R, we define the function (af +bg):X ->Y by 
requiring for each xeX that 

( af+bg)(x) = af(x)+bg(x ). (B.l) 

Now we turn to probability measures. If X is any set, a 
"CT-algebra” on X is an object I* with these properties: i) 
Xx is a set whose members are some of the subsets of X ; ii) 
XeZ*; iii) if U,V € S* then U \V e ; iv) if V L e 5^ for 
i = 1,2, • • • then V l <uV 2 u V 3 u ■ ■ • e Z*. A probability 


measure on X is a pair ( p , L* ) in which X* is a o-algebra 
on X and p: 1% — »R. The function p must have these addi- 
tional properties: i) p(X) = 1; ii) p(t/)>0 if U e iii) p 
is countably additive; that is, if V U V 2 , • • • eE* and 
Vi r\V: = 0 for all i * j then p(V 1 uV 2 u ' / 3 u • • • ) = 

oo 

£ p( V, ). The sets belonging to L x are the "p-measurable" 

i=l 

subsets of X. Halmos (1950) gives the theory of integration 
with respect to measures. If (P.X* ) is a probability meas- 
ure on X and if the function / :X-»R is p-integrable, then 
its integral is written J x dp(x) f(x). We abbreviate the 
integral as £(p,/) and call it the expected value or mean 
of / with respect to p. When there is no danger of confu- 
sion, we write simply E (/ ). 

Suppose that X* and 1 Y are a-algebras on X and Y , and 
F:X-*Y . We say that F is "measurable" if F~’(K)e X* 
whenever V e Xy. If F is measurable, it defines a function 
F ~ x : Xy — > X* that exists whether or not F has an ordinary 
inverse F~ l :Y->X. If (p,X* ) is a probability measure on 
X and F:X->Y is measurable and tj = poF' 1 , then 
(Tl.Zy) is a probability measure on Y. For any V e Xy, 
71(10 = p(F“’( V)). The measure t| is "induced" on Y by 
F and p. Induced measures give a succinct formula for 
changing variables of integration. If / : Y R then 
£(p,/oF) = £(Ti,/),or 

£(p,/oF) = E(poF-\/). (B.2) 

In the present paper if Y is a finite-dimensional real vec- 
tor space, then Xy will always be the set of all Borel subsets 
of Y\ that is, Xy is the smallest a-algebra that includes as 
members all open subsets of Y . Therefore, if Y and Z are 
finite-dimensional real vector spaces and //: F— >Z is con- 
tinuous, H is measurable (Halmos, 1950). In particular, if 
H is linear it is measurable. 

A dot product on a vector space X is a symmetric, 
scalar-valued bilinear form on X. A dot product space is a 
vector space together with a particular dot product. A real 
Hilbert space is a real dot product space which is complete 
in the norm defined by the dot product. The following 
remarks paraphrase Halmos (1951). For any subset U of a 
real dot product space X we define U 1 , the orthogonal com- 
plement of U, to be the set of all xeX such that x- u = 0 
for every ue (/. If X is a Hilbert space, U 1 is a closed 
subspace of X . If U is a closed subspace of Hilbert space 
X then ( U 1 ) 1 =U , and for each xeX there are unique 
vectors u(x)e U and u I (x)e U 1 such that x = u(x)+u 1 (x). 
Then we can define a function n u:X—>U by setting 
n (/ (x) = u(x) for all xe X. This function is the orthogonal 
projector of X onto U. It is linear, idempotent [i.e. 
Ily oily =!![/] and symmetric [i.e. x,-n l/ (x 2 ) = 

x 2 n (/ (x 1 )]. Clearly, Ily o = ri^io Fly = 0, and 

ny +n u i = i x - 

If Y and Z are finite-dimensional real dot product spaces 
and F :Y->Z is linear, then F J \ Z-*Y is the unique linear 
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function such that for every yef and every z e Z 

F(y)-z = yF r (z). (B.3) 

F r is the "adjoint" or "transpose" of F . Clearly 
(F r ) T =F. If either F or F J has an inverse, so does the 
other, and (F r ) -1 = (F~ l ) r . In that case we abbreviate both 
(F r ) -1 and (F _1 ) r as F~ T . If W is another finite- 
dimensional real dot product space and G : W -> Y is linear, 
then (FoG) r = G T oF T . 


APPENDIX C: VIRIAL BOUND ON MAGNETIC ENERGY 

In this appendix our goal is to prove that 

(2PO)- 1 f WB 2 <(8kGT 1 \ Wg 2 (C.l) 

J R 3 J R 3 

where p<, is the permeability of vacuum, B is the geomag- 
netic field, G is Newton’s universal constant of gravitation, 
g is the earth’s gravitational acceleration field, and R 3 is all 
of three-space. 

There is an appealing physical argument for (C.l). The 
left side of that inequality is E B , the energy of the geomag- 
netic field, while its right side is -E G where E G is the grav- 
itational self energy of the earth. The total energy of the 
earth is E B +E G +K + U , where K is kinetic energy (mostly 
rotational) and U is internal energy. If we imagine the 
earth to be a perfect electrical conductor, and disassemble it 
by expanding it uniformly to infinity at constant angular 
momentum, then each of the four terms in the total energy 
will tend to 0. The assembled earth presumably has a 
smaller energy than its disassembled state, and K and U are 
positive, so we have (C.l). 

A more formal proof starts with the pre-Maxwell equa- 
tions, obtained by neglecting the displacement current. The 
momentum equation for the matter in the earth is then 

pD,u = VT+JxB + pg. (C.2) 

where p is the density of matter, u is its velocity, T is its 
stress tensor, and J is the electric current density. If 3, 
denotes the time derivative at a point fixed in space, then 
D, = 3, + u ■ V is the time derivative at a particle moving 
with the material. Therefore, if r is the position vector, then 
u = D, r, from which follows 

r D,u = l AD, 2 r 2 -u 2 . (C.3) 

Another appeal to the pre-Maxwell equations yields the well 
known result that 


JxB = V-M (C.4a) 

where the Cartesian components of the magnetic stress ten- 
sor M are 

M ij = v^'(B‘B j -V2B 2 & ) . (C.4b) 

It is less well known but equally useful that 

pg = V • G (C.5a) 


where the Cartesian components of the gravitational stress 
tensor G are 

G ij = -(AkG )“’ (gig* - Yig 2 ^). (C.5b) 

Equations (C.5) follow from V • g = -4 nG p and V x g = 0. 

We write the Cartesian components of r as r‘ or r, and 
those of V as 3' or 3,- , so that we can use the Einstein sum- 
mation convention. The Cartesian components of (C.2) can 
now be written 


pD t u‘=djQ ,j where (C.6a) 

Q‘i = T‘‘ +M ij +G‘i . (C.6b) 

If we multiply (C.6a) by r, and sum on i , then an appeal to 
(C.3) and the product rule for derivatives gives 

KpD 2 r 2 -pu 2 = 3 j (r,Q‘t )-8„- Q ij . (C.7) 

We want to integrate (C.7) over all of R 3 . If 3V is the 
surface of the region V occupied by the earth, we can 
integrate (C.7) over V and R 3 W separately and use Gauss’s 
divergence theorem to cancel the contributions of 3 y ( r i Q ‘ j ) 
from the two sides of 3V. Thus we obtain 


ViJ dV pD, 2 r 2 = J dVpu 2 -] dVfyQ 1 *. (C.8) 

Jr 3 J r 3 

For any scalar field / , we can write f dV pf as [ dm f 
where dm =pdV is an element of mass. If dm is always 
the same parcel of mass in the moving material, dm is 
independent of time, so (d/dt)j dm f - j dmD,f. 
Therefore R R 


(C9 > 

Throughout most of the earth, =-3 p to high accu- 
racy, and =-B 2 /(2po) and 8 tj G iJ - g 2 /(SnG). 

Thus, applying (C.9) twice to the left side of (C.8) and not- 
ing that p and p vanish outside V, we obtain 

±j^j v dVpr 2 =2K+3j v dVp +E m +E g . (C.10) 


The left side of (C.10) is negligible compared to the pres- 
sure integral on the right, even in a large earthquake, and 
p > 0, K > 0, so (C.10) implies (C.l). 

Equation (C.10) is the scalar virial equation. For other 
derivations of versions of it, see Chandrasekhar (1961, p577) 
and Parker (1979, p58). 


APPENDIX D: LINEAR IMAGES OF SOLID ELLIPSOIDS 

In this appendix we show that the linear image of a solid 
ellipsoid centered on the origin is another such ellipsoid, and 
we discuss briefly how to find the image. This image ques- 
tion arises in several contexts when we calculate 
confinement sets, so we will try to keep the discussion as 
general as possible. 
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We begin with a real vector space X about which we 
know nothing else. Its dimension may be finite or infinite, 
and it may or may not have a topology, a norm or a dot 
product. In such an "unfurnished" vector space the simplest 
definition of a solid ellipsoid centered on 0 is this: let 
Q :X xX->Re be any positive definite symmetric (pds) bil- 
inear form on X, just as in (4.1). Let El (X , Q ) be the set of 
all x e X such that 

G(x,x)<l. (D.l) 

Any subset of X constructed in this way from a pds bilinear 
form on X will be called a solid ellipsoid in X centered on 
0. Given any such ellipsoid, we can use it to furnish X with 
a dot product, namely 

*r*2 = Q ( x i. x 2>- (D.2) 

Under this dot product and its norm, El(Q,X) is the solid 
unit ball consisting of all xe X such that j|x|| < 1. 

To find the linear images of El(Q,X) we will need the 
orthogonal projectors of X provided by (D.2) onto the 
closed subspaces of X . Therefore we ask that X be complete 
in the norm generated by (D.2). That is, X must be a real 
Hilbert space with dot product (D.2). If dim X < °°, X is 
automatically complete. If dim X = <», we must assume 
completeness, and we do so. 

Next, suppose that Y is another real vector space, and 
dim Y < oo. Endow Y with the unique topology that makes 
it a topological linear space (Dunford and Schwartz, 1958, 
p372), i.e. the ordinary Euclidean topology in which conver- 
gence of a sequence of vectors means convergence of all 
their component sequences relative to each basis for Y . 
Suppose that L:X->Y is linear and continuous. (Continuity 
follows from linearity if dimX < °°.) We will show that 
there is a pds bilinear form Q on L(X) such that 

L(El(X ,Q)) - El(L(X),Q) . (D.3) 

That is, L(El(X,Q)) is a solid ellipsoid centered on 0 in 
L(X). 

To find Q let N L be the null space of L, the set of all 
xe X such that L(x) = 0. Since L is continuous, N L is a 
closed subspace of X. Under (D.2), let P x :X—>N L be the 
orthogonal projector of X onto N L , and let P = I X -P X . 
Then P is the orthogonal projector of X onto N L X . Define 

K :P(X)->L(X) as 

K=L\P(X). (D.4a) 

We claim that K has an inverse, 

K-':L(X)^>P(X). (D.4b) 

To prove this, we show that K is both injective and surjec- 
tive. Since K is linear, to show that it is injective we need 
show only that x € P (X ) and K (x ) = 0 imply x = 0. But if 
xe P(X) then xeA^ 1 , while if K(x) = 0 then L(x) = 0 so 
xe N l . Since N L nN L x = {0}, the conclusion follows. 


To prove that K is surjective, suppose that y e L (X ). 
Then y = L(x) for some xeX. Thus 
y = L o P (x)+L o P x (x). Since P x (x)e N L , therefore 
LoP x (x) = 0, and y=LoP(x). But P (x)e P(X ), so 
L o P (x ) = K(P (x )). Thus ye K(P (X)), and K is surjec- 
tive. 

Knowing that X -1 exists, we can now define 
Q'.L(X )xL(X )-> Re as 

e(yi.y2) = Q(^- I (yi),x- 1 (y 2 )) (D.5) 

for all yi,y 2 e L(X). Clearly Q is a pds bilinear form on 
L(X). To_ prove (D.3), first we show that L (El (X ,Q))c 
El(L(X),Q). Suppose that yeL(El(X,Q )). Then 
y = L(x) for some xeX that satisfies (D.l). But 
L(x) = L o P(x) = K(P(x)), so P(x) = K~\ y). More- 
over, HZ’ (x) || < HP || || x || and ||P || < 1, ||x|| £ 1, so 
||K- 1 (y)ll < 1. or I- Hence 

Q(y,y) ^ 1, and ye El(L(X )j2)- 
To prove that _El(L(X),Q)czL(El(X ,QJ) we suppose 
that y e El (L (X ), Q ). Then y e L(X) and Q(y,y) < 1. Let 
x = K -1 (y). Then xeP(X), and 0(x,x)<l. Thus 
y = K (x ) for an x e L (El (X , Q )). This completes the 
proof. 

When dim X = the foregoing calcualtions require sum- 
ming infinite series, and cannot be done exactly on a com- 
puter. The remedy is to construct a truncated approximation 
as in sections 3 and 6, as if (X ,L,L ) were an inverse prob- 
lem. 
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Lucke-Mauersberger-Lowes Power Spectrum 
from Magsat Data 

J. Cain, Z. Wang, D. Schmitz & J. Meyer (1989), Geophysical Journal 97, 443-447 

fi(c,/) = (( + l)'j8j"(c) 2 =-l T J dA(r) I B ; (r)l 2 

m=-l 4to S(c) 


a = core radius = 3486 km 
b = earth radius = 6371 km 



Spherical Harmonic Degree l 
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Finding the Source Depth from the Power Spectrum 


Origin 0 at center of earth, and position vector r = rf, I r I = 1. 


S {b ) = spherical surface of radius b centered on 0 


( f)s(Jj ) = area average of / on S (b). 


nn r) = real spherical harmonic polynomial of degree / in r, — / <m </ 


{ Hf 1 H™ )s(\) = $//' (i.e. H™ fully normalized). 


If B = magnetic field produced by the core, and r >a — core radius, 

then B = — V<{) where for any b 


, , m=l 

J + 1 v> 


4>(r ) = bZ(blry +l 2 g f'{b){7Uir m H’ t n {f). 

1=1 m=-l 


b l+2 gj n (b ) = c l+2 gj n {c ) for all b and c. 


m=l 


<1 B / l 2 ) s(r) = (i+l) Z gnr) 2 = R(r,l) 

m=-l 

= Lucke-Mauersberger-Lowes Spectrum. 


R(r,l) = ( c/r) 1M R(c,l ). 


oo oo 

<lBl 2 ) s(r) = ZR(r,l)= Z(c/r) 2M R(.c,n. 
1 = 1 1=1 
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Isotropically Random Scalar Fields f on the Spherical Surface S(l) 


£[/(?)] = 0 . 

£[/(?)/(§)] = £(?•§) = £[/ 2 ] C(f-§). 

K = autocovariance function of / on S(l) 

C = autocorrelation function of/ on S(l) 


/(f) = £ irrmn 

1=0 m=-l 

EUr/f] = */M W , OS/. I mis/. 

CO 

*40 = X(2l+l)k[P[ (ii). 

1=0 


White Noise 


AT(fs) = 4 tco 2 6( f - § ). 
£[/"/“'] = o 2 8 (r 3 w , OS/. \m\<l. 
k t = o 2 , 0 < / 


APPENDIX B 

White Noise Non-dipole Radial Field B r on S(w) 

All Sources Inside S(w), and r > w 

oo m=l 

B r (r f)= £ £ 7 

1=2 m--l 

r l+2 y™(r) = w l+2 y /"(w) if r > w. 

E[yr(w)yFXw)] = o 2 5 „. 5 w> 2 < l, -l < m < l. 

E [ Y ; m (r ) yf(r ) ] = o 2 (w /r ) 2m 8, r 8 W , 2 </,-/< m < l. 

Y/ m = (/ + 1) (21 + l)~ >/2 g/ m . 

y, m = (7 + 1) 1 ' 2 (2/ + l) 1,2 g, m = (/ + l)~ 1/2 (2/ + 1) Y/ m ■ 

C. Constable and R. Parker, JGR 93, 11569-11581 (1988) 

V. Courtillot, J-P. Valet, G. Hulot & J-L. LeMouel, EOS, 73, 337-342 (1992) 

f(w f ) = -(4 ji) -1 J dA (s) 2 ( f ’§) V (w s) where 

5(1) 

Q 00 = £ (2/ + 1) (/ + ir 3/2 ^ 00 and 
1=2 

Vf = r 2 V 2 -rd 2 r = angular part of laplacian 
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Expected 1 dependence of L-M-L spectrum R(c, 1) if B r is white noise on S(w) 


gf(c ) = (2/ + 1) 1/2 ( / + 1) -1 yj"(c ) 
yj"(c ) = (w /c ) ,+2 y["(w ) 

E W») yF'M] = o 2 5 ir 8 mm . 


m=l _ 

tf(c,/) = (/ + l) 2 g/"(c) 2 

m=-l 

lnR(c,/) + ln(/ + l)-ln(2/ + l)-C; = 
= l In [ (w /c) 2 ] + ln[2(w /c) 4 a 2 ] 

m=l » 

=ln '/2 2 [7/”C>v) /cr] 2 
m=-l 


£[lntf(c,/)] + In (/ + 1) - In (2/ + 1) - £ [ C,, ] = 
= / In [ (w / c ) 2 ] + In [ 2 (w / c ) 4 0 2 ] 

£[£,] = V(/+V4) = In / as / ~ 

Var[ln«(c,/)] = Var[^ ( ] = 9 Z y ( / + '/z ) 

y(z) = 9 Z lnT(z) 


II '» 
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i 

Two-Parameter Stochastic Model of Non-Dipole Radial Field on CMB 

S(a) = core-mantle boundary, a = 3486 km 
S(c) = satellite orbital sphere, c = 6791 km 

°° m=l 

B,(rt)= £ X Y/V )#/”(?)• 

1=2 m=-l 

r l+2 yj n (r ) = c l+2 y[ n (c). 

General Isotropic Stochastic Model 

£[£ r (rr)B r (rs)] =K(r, f-S) 

E[yr(r)yP(r)]=k l (r)d ll , 5 W 

oo 

K{r,[ i)= 2(2/+l)^/(r)P z 0i) 

1=2 

r 2l+4 ki(r ) = c 2/+4 £y(c) 

White Noise Model: kj(w) = a for some radius w 

v = ( w / a ) 2 

K(a,^) = o 2 E(2/+l)v ,+2 ^ ; (n) = is:(a,l)C(a,n) 

/=2 

v = 0.766 ± 0.031 



X'(a,l) = (6±3)x 10 10 nT 2 
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CONCLUSIONS 


1. The MAGS AT data for 2 < / < 12 are well fitted by an isotropic sto- 
chastic model with two adjustable parameters: a power level G 2 and 
an apparent white noise radius tv . 


2. In these models, the / dependence of E[\nR (b , l)] includes a 
digamma function \|/( l + Vi) and some logarithmic terms as well as a 
term linear in / . Also, Var [ In i? ( £> , 0 ] = d z ^ (/ + Vi ) , indepen- 
dent of b , a 2 , or tv . 


3. The white noise radius tv depends on which scalar field is modeled as 
white noise on SC tv). For B r , tv is about 440 km below the CMB. 
This corresponds to a correlation length (half peak width at half 
power) of about 12 degrees or 750 km. 


b = earth radius, B = — V(J> 

<t>(r ) = b'£(blr) l+1 £ g, m (b) (2l+\T m H^t). 

1=2 m =-l 

R(b,l) = (l + l)Zg, m (b ) 2 

m=-l 


\j/(z) = d z lnr(z) 


